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Abstract 

Traditionally motion and shading have been treated as two disjoint problems. On the one 
hand, researchers studying motion or structure from motion often assume uniform lighting 
conditions over the whole surface and good contrast at high spatial frequencies to minimize 
the effects of variations of the image irradiance of the patch as the surface moves. On the 
other hand, researchers primarily concerned with the shape from shading problem only consider 
static brightness data in order to recover the shape without considering the change of brightness 
induced by motion. 

A new formulation for recovering the structure and motion parameters of a moving patch is 
presented. It is based on using the spatiotemporal derivatives of irradiance that are computed 
from a time-varying irradiance sequence and combined into a differential constraint equation. 
The new approach determines the rigid body motion and the structure of the patch directly 
from the irradiance sequence using both motion and shading information. 

A new constraint equation, the full irradiance constraint equation (FICE), is derived. It 
links the spatiotemporal gradients of irradiance to the motion and structure parameters and 
the temporal variations of the surface shading. This equation separates the contribution to the 
irradiance spatiotemporal gradients of the gradients due to texture from those due to shading 
and allows the FICE to be used for textured and textureless surface. The new approach 
combining motion and shading information, leads directly to two different contributions: it 
can compensate for the effects of shading variations in recovering the shape and motion; and 
it can exploit the shading/illumination effects to recover motion and shape when they cannot 
be recovered without it. The FICE formulation is extended to multiple frames, and several 
methods are presented for efficiently computing the structure and motion parameters directly 
from a sequence of data. 

Overall, the examples demonstrate the superiority of the FICE algorithms to the classical 
CE algorithms in two distinct areas: the accuracy of the results is higher for textured surfaces 
and a solution can be determined in the case of textureless surfaces. 
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Chapter 1 



Introduction 



In recent years, a lot of attention has been focussed on the problem of recovering the three- 
dimensional structure and motion of objects from the corresponding two-dimensional planar 
projections of these objects on the image plane. Advances in robot and mobile robot technology 
have led to an increased demand for reliable algorithms that can determine structure and motion 
of the surrounding environment from time- varying sequences. The problem arises in navigation, 
e.g. (Bruss and Horn 1983, Negahdaripour and Horn 1987), where the motion of an observer 
in a fixed environment is computed, object tracking, e.g. (Roach and Aggarwal 1980, Schalkoff 
and McVey 1982), where one or more objects are moving against a slowly changing background, 
object recognition and any other area where a machine interacts with its environment with the 
aid of a vision system. 

Humans can rapidly integrate a large number of visual cues like stereo, shading and motion 
information and easily perform the task of recovering the three-dimensional world from a series 
of essentially two-dimensioned projections of the world on their retinas. Most current machine 
vision algorithms do not provide data fusion nor do they use multiple sources of information 
to compute the three-dimensional representation, i.e. the relative or absolute depths at each 
point of an object and its motion. Currently prevailing algorithms either use stationary cues 
to recover the structure, as in the shape from x problems, where x can be shading, texture, 
silhouettes etc. or use motion cues to recover structure and motion. Each formulation has 
its own set of assumptions and limitations and little has been done in the way of combining 
information of different areas to increase the quality of the solution. 

17 



18 CHAPTER 1. INTRODUCTION 

Traditionally motion and shading have been treated as two disjoint problems. On the one 
hand, researchers studying motion or structure from motion often assume uniform lighting 
conditions over the whole surface and good contrast at high spatial frequencies to minimize the 
effects of variations of the image irradiance (or more subjectively the brightness) of the patch 
as the surface moves. On the other hand, researchers primarily concerned with the shape from 
shading problem only consider static brightness data in order to recover the shape without 
considering the change of brightness induced by motion. 

This thesis will combine the use of motion and shading information to estimate the three- 
dimensional motion and structure of an object from its time-varying sequence of image bright- 
ness. The new approach, combining motion and shading information, leads directly to two 
different contributions: it can compensate for the effects of shading variations in recovering the 
shape and motion; and it can exploit the shading/illumination effects to recover motion and 
shape when they cannot be recovered without it. 

1.1 Problem Description 

The most general task would be to recover the three-dimensional motion and structure of 
arbitrary, multiple objects from the time-varying sequence of 2-D projections of these objects 
onto the image plane. Such a general formulation is not yet within reach and needs to be 
specialized further by the type of motion and by the type and number of objects. One important 
special case is rigid body motion. It provides a compact way of expressing motion compared 
to the dense, 2-D velocity field computed in optical flow methods, and is quite realistic for a 
large variety of objects that are rigid or can be segmented into rigid subparts. 

The extent of the task will be further restricted to a single moving rigid object to avoid 
dealing with the problem of scene segmentation which is beyond the scope of this thesis. Such 
segmentation can be facilitated by knowledge of the structure or depth map, or an estimate of it, 
since in many cases different objects can be separated on the basis of depth discontinuities. The 
availability of stereo data from which we can compute a depth map, or direct range information 
from sonar or radar scans, provides a possible solution to the segmentation problem. 

Finally, the last restriction that will be imposed is on the type of object. Unless we are given 
the structure information in terms of a depth map, in order to recover rigid motion, we need 
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to assume some a priori structure, e.g. planar or curved patches, and determine the structure 
parameters, like surface normals and curvatures. 

After taking into account the previous restrictions, the problem that we want to solve can 
be formulated in the following way: 

Problem: Given a time-varying sequence of image irradiance patterns obtained by imaging a 
single rigid object under inertial motion, and given an a priori structure and surface reflectance 
model, we want to recover the structure parameters and the instantaneous velocity of the object 
using shading and motion induced variations in the sequence of images. 

Since we are concerned with instantaneous velocity, a differential approach to the problem 
can be used and constraint equations that link the variations of irradiance, represented by 
its spatiotemporal gradients, to the temporal variation of the shading on the surface, can be 
derived. The problem can then be solved by a least-squares minimization of all the available 
data. The constraint equations can be extended temporally to several frames at a time, under 
mild additional assumptions, to increase the accuracy of the solution and decrease the noise 
sensitivity by augmenting the amount of data used in the minimization. 

1.1.1 Shading Constraints 

Shading is a physical phenomenon produced by the interaction of a light source with the surface 
of an object. The two-dimensional irradiance distribution, observed on the image plane, is 
related to the geometry and properties of the three-dimensional scene by the process of image 
formation (see section 2.1). Shading depends on many parameters of which the structure of the 
object and its position in space with respect to the light source and camera are also parameters 
relevant to the structure and motion recovery task, and need to be estimated. 

In the shape from shading problem the goal is to recover the surface normal at every point 
given a surface reflectance and a light source model. Depending upon the formulation and 
assumptions, the light source direction and its strength are either given or estimated. Work 
on illuminant direction determination can be found in Pentland (1984), Brown and Ballard 
(Brown et al. 1982) and Lee and Rosenfeld (1985), while Brooks and Horn (1985) simultaneously 
estimate the smoothest normal field from a Lambertian surface and the source direction. In 
many cases the strong assumption that the surface is textureless, i.e. has constant reflectance 
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properties across it, is required to solve the shape from shading problem. 

In the formulation of this thesis, an a priori surface model is required for which we can predict 
the spatial (across the surface) and temporal shading variations, given a reflectance model and 
a source model and position. In a multisensor situation the shape could be directly available 
in the form of range data, but in our case, image irradiance is the only input and a geometric 
model of the surface is assumed. In this context, the easiest surface models are parametrized 
patches that can be characterized by a small set of parameters, such as a unique normal for a 
planar patch or the normal and curvatures at a point of a quadratic patch. However, unlike 
the usual formulation of the shading problem, no restrictions are placed on the texture. 

The shading equation relates the irradiance at a given point of the image to the normal at 
the corresponding point on the surface and to the position of the light source. The brightness 
constraint equation, more correctly called the full irradiance constraint equation (FICE), intro- 
duced here expresses the relationship between the spatial irradiance gradient of a given image 
point, the motion of the corresponding point on the surface, the temporal variation of the shad- 
ing and the structure of the object. Given a point on the image plane, the shading constraint 
allows us to separate the temporal variations of irradiance due to the change in orientation of 
the surface, in motion with respect to the light source, from the spatiotemporal variations of 
irradiance due to the motion of the 3-D point being imaged (see Section 2.3). 

The framework developed in the thesis is valid for any parameterized surface patch although 
the algebraic complexity increases much faster than the surface degree. For clarity in expla- 
nation, most of development in the next chapters will be for planar surfaces when a specific 
surface model is required. Extensions to the quadratic patches case will be treated in chapter 5. 

1.1.2 Multiple Frames Processing 

One of the main problems in dealing with real data is the inherent noisy nature of the data. 
Careful acquisition of the data with good video cameras reduces the noise levels, but sensor 
noise, spatial non-uniformity of the optics and quantization noise due to the conversion of the 
continuous analog pixel intensity to an 8 bit quantity, always contribute to the overall noise in 
the sequence. 

The least-squares approach adopted in the algorithms allows us to minimize the effect of 
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the noise and obtain robust solutions. The technique is particularly robust in cases where the 
amount of data is very large in comparison to the number of parameters to estimate. The 
noise averaging property of the least-squares method is proportional to the square root of the 
amount of redundant data used. Given the spatial extent of the image, or equivalently the field 
of view, and a spatial sampling, the amount of data available per frame is fixed and the only 
way to increase it is to use multiple consecutive temporal frames. In the context of this thesis, 
"multiple" means at least three, since all motion algorithms require a minimum of two frames 
already. 

Additional, mild assumptions on the motion are required in the multiple-frame formulation. 
In order to keep the number of unknown parameters constant, and benefit from the additional 
data, the instantaneous translation and rotation vectors need to be constant in the time in- 
terval spanned by the frames used in the minimization. Although the assumption seems very 
restrictive, typical motions can generally be treated as constant over a brief period of time. 
If the sequence is captured with a regular video camera, each field has a duration of 1/60 s, 
and the motion constancy approximation is well justified, for normal motion speed, over three 
to seven frames. The previous assumption can be relaxed and algorithms can be designed to 
accommodate slowly varying motion. In essence, constancy is assumed when computing the 
instantaneous motion vectors of the central frame, but the motion vectors are allowed to change 
slightly when the temporal window is moved one frame and the motion parameters for the next 
frame are computed. In practice, the motion vectors are made up of a base term, or initial 
velocity term, and an update term that is computed in subsequent frames. 

1.2 Previous Approaches 

Two major classes of methods, two-stage and direct, are used in computing motion and struc- 
ture. In the two-stage methods, the structure and motion parameters are not computed directly 
from the brightness data but from an intermediate quantity that is derived from the data. Typ- 
ical preprocessing operations are the computation of the optical flow or the determination of the 
correspondence of discrete features. In contrast, direct methods compute the motion vectors 
and recover the structure directly from the brightness field, bypassing the preprocessing stage. 
Two-stage approaches are further split into discrete and difFerential methods. Discrete meth- 
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ods deal with sparse data, or features, and generally compute finite motions while differential 
methods can determine the instantaneous velocities. The direct approaches use a differential 
formulation exclusively. 

Although a lot of work in the field has been aimed at obtaining more robust solutions and 
reduced sensitivity to noisy data, it comes as a surprise that very little has been done to develop 
algorithms that use, globally, the temporally highly correlated data provided by time- varying 
sequences. Similar to the work in optical flow computation, algorithms have been developed 
to use multiple frames only in a weak way. The final estimate of the flow, or motion and 
structure parameters, is used as initial estimates in the processing of the next two frames of 
data. Many of these algorithms perform very well if a good enough initial estimate is available, 
and one can achieve a large reduction in the computation cost of subsequent estimates by using 
the previous frames estimates. Even though they are apparently using multiples frames, these 
latter algorithms are truly two-frame algorithms, and the quality of the successive estimates 
is highly dependent on the very first step and the ability to find an initial guess that is good 
enough for the algorithm. The Locally Constant Angular Momentum model (LCAM) of Huang 
(Huang et al. 1986) is such a pseudo multiframe algorithm, and is successively applied to pairs 
of frames in a sequence. True multiframe methods have been proposed and will be discussed in 
greater detail in section 1.2.4. 

Finally, we will review some previous attempts in including shading in the motion and 
structure problem. The first two papers (Webb and Aggarwal 1983, Aloimonos 1987) are truly 
alternate solutions to the shape from shading problems, since they only recover the structure of 
the object, but use motion information, while Nagel (1989) addresses the problem of recovering 
both structure and rigid motion using shading information but fails to present a realistic shading 
model and does not go beyond the presentation of a model leaving open the question of the 
solution. 

1.2.1 Discrete Motion Estimation 

In discrete motion estimation the problem is twofold. First, we have to determine good features 
and compute an accurate disparity field, i.e. solve the correspondence problem, then we have 
to extract the motion and structure parameters from the disparity field. Usually, features 
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are points, corners or lines. In the second stage, the problem is reformulated with respect 
to intermediate variables, often called essential parameters, that are solved first. The final 
operation is to extract the motion and structure parameters from these variables. 

Different schemes that use n points in p frames have been proposed, for example Tsai and 
Huang (1981) use 8 points in 2 images while Roach and Aggarwal (1980) use 5 points in 3 views, 
although, theoretically, 5 points in 2 frames are necessary and sufficient to recover the structure 
and motion parameters. Most of the proposed schemes lead to a system of nonlinear equations, 
although some algorithms for planar patches (Tsai and Huang 1984) only require the solution 
of a set of linear equations. The discrete approach has the major disadvantage of requiring 
the solution to the correspondence problem, that is the identification of corresponding image 
features between successive frames, a task that can be extremely difficult. By definition, the 
method relies on the spatial disparity of similar quantities and can run into trouble when feature 
points are too close together. Finally, although such schemes can be coded very efficiently and 
are computationally attractive, the method is very sensitive to noise and feature occlusion. 

Ullman (1983) introduced a multiframe algorithm for wire frame objects that is relatively 
insensitive to errors in the discrete measurements. His algorithm maintains a model of the 
viewed object and updates that model each time a new frame is obtained. The model is updated 
in a way that minimizes the amount of nonrigid motion necessary to account for the changes 
observed in the new frame. This position-based model was later extended to a velocity-based 
model by Grzywacz and Hildreth (1987). 

The problem of recovering motion and shape from widely separated frames, using known 
correspondence of several points in several frames, is similar to the classic photogrammetric 
problem of relative orientation that can be solved iteratively from the parallax equations, e.g. 
(Moffit and Mikhail 1980). Relative orientation is also a key issue in binocular stereo vision 
where we need to determine the relative orientation of one camera with respect to the other. 
Recently, Horn (1987) presented a new scheme that does not require an initial guess of the 
baseline and the rotation, for the recovery of relative orientation and discussed the existence of 
multiple solutions and their interpretation. 
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1.2.2 Rigid Body Motion From Optical Flow 

In some of the two-stage differential approaches (Longuet-Higgins and Prazdny 1980, Waxman 
and Ullman 1983), the optical flow and its derivatives are used to determine the motion and 
the structure of the scene. First applied to planar patches, e.g. (Waxman and Wohn 1984), the 
method has been extended to curved surfaces, e.g. (Wohn and Waxman 1985) for quadratic 
patches, by approximating the patch in the neighborhood of the line of sight by its truncated 
(usually at the second order) Taylor series expansion. These methods, although very elegant 
mathematically, suffer from pronounced sensitivity to noise since they use higher derivatives of 
the optical flow or assume, unrealistically, that the optical flow and its spatial derivatives are 
known exactly. 

Differential methods are very often based on a least-squares formulation that uses a dense 
field, like the optical flow, in conjunction with a constraint equation and, sometimes, a model 
of the structure of the patch. For example, Adiv (1985), Ballard and Kimball (1983) and Bruss 
and Horn (1983), do not assume any structure and use the optical flow field at every point of the 
image in order to minimize the sensitivity to noisy data while Netravali and Salz (1985) performs 
successive linearization around the current estimate without assuming any given structure. 

1.2.3 Direct Estimation of Motion and Structure Parameters 

More recently, several methods (Negahdaripour and Horn 1987, Horn and Weldon 1988) that 
directly use the image brightness and its gradients, without computing the optical flow have 
been proposed. These differential methods compute the motion parameters directly from the 
observed data without requiring the intermediate step of optical flow computation and rely on 
Horn and Schunck's constraint equation (Horn and Schunck 1981) rewritten in terms of the 
instantaneous rotation a>, instantaneous translation t and the reciprocal of the depth Z at each 
point. The brightness constraint equation links the spatiotemporal irradiance gradients to the 
rigid body motion parameters and to the object structure by the depth value at each point. 
The authors present iterative and closed-form solutions for planar surfaces (Negahdaripour and 
Horn 1987) and an iterative solution to the quadratic patch case (Negahdaripour 1986). 

Direct methods have also been used to estimate special motion of arbitrary surfaces. Horn 
and Weldon (1988) developed robust algorithms to recover pure rotation or pure translation of 
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a rigid body and demonstrated it on real data. 

1.2.4 Multiple-Frame Algorithms 

In addition to the apparent multiframe methods described in the introduction of this sec- 
tion, two multiframe, discrete or feature based approaches have been proposed, a geometrical 
approach and a stochastic estimation approach using Kalman filtering. In addition to these 
discrete methods that all rely on correspondence of features, Bolles and others developed a 
totally new framework that relies on very dense time sampling of image sequences. 

The geometrical approach of Shariat and Price (1986) is reminiscent of the previously men- 
tioned discrete approaches where there is a choice of how many points in how many frames to 
consider in order to have a fully determined problem. In their work, they describe a 5 frames, 
1 feature algorithm although they also mention possible use of 4 frames and 2 features or 3 
frames and 3 features. The model consists of a rigid object, with constant motion, observed 
from a temporally-uniform sampled sequence. The algorithm is based on the idea that, if the 
translation of the rigid body is compensated for, every feature on the object will trace a circle 
in space. They infer the necessary equations from geometrical relations. The convergence of 
the algorithm is highly dependent on the choice of the initial guess and no proof of convergence 
or uniqueness of the solutions is given. 

In their study Broida and Chelappa (1985, 1986) cast the problem of recovering motion 
parameters into a classical parameter estimation problem in the presence of noise and use a 
Kalman filtering and a maximum likelihood (ML) approach. In their model, they assume the 
structure of the object and the image coordinates of the object match points and only consider 
transparent objects to insure that every match point is always visible throughout the sequence. 
They present both a recursive solution (Broida and Chellappa 1986) that uses an extended 
Kalman filtering approach and a batch solution (Broida and Chellappa 1985) that computes an 
ML estimate of the parameters. Additionally, it is assumed that the noise level associated with 
the match points is known in the recursive approach, and that the initial angular orientation is 
available in the ML formulation. These techniques perform very well on a wire frame synthetic 
model and achieve results very close to the theoretical limit as given by the Cramer- Rao bound, 
but have the major drawback of assuming that the correspondence problem has already been 
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solved. 

Bolles and other (Bolles and Baker 1985, Bolles et al. 1987) opened up a new dimension in 
structure recovery by making time the preferred direction in their epipolar plane image analysis. 
Although their goal is not to recover motion but to recover the structure of the scene from 
motion sequences knowing the motion, their innovative use of the highly correlated temporal 
information might spawn novel ideas in the field of motion and structure recovery and is worth 
outlining here. In their method, they collect high frame rate video sequences, stack them into a 
spatiotemporal parallelepiped, slice the data along the temporal dimension to localize features 
in the temporal slice, deduce the 3-D location of the features and reconstruct the structure 
of the scene. In their initial paper (Bolles and Baker 1985), they assumed a viewing direction 
perpendicular to the direction of motion so that features were linear, but later (Bolles et al. 
1987) relaxed the assumption and only required a known arbitrary motion. 

1.2.5 Combining Shape, Shading and Motion 

1.2.5.1 Using Motion Information in Recovering Shape From Shading 

Webb and Aggarwal (1983) incorporate motion information in their solution to the shape from 
shading problem. Their analysis relies heavily on the assumption of rigid body motion, planar 
surface and orthographic projection, but does not require explicit knowledge of the rigid mo- 
tion. The algorithm applies to a Lambertian surface with arbitrary spatial variations in albedo 
and a single distant light source of known direction, and starts by considering a single point 
correspondence between two adjacent frames. The next step matches respective neighboring 
regions of the matched point, by assuming that the texture of the two patches are related by an 
affine transformation (small rigid body motion assumption) and that the image intensities can 
only change by a multiplicative factor, ff n r = (-p, -q, 1) represents the normal at a match 
point, these two constraints can be expressed by two conic equations in p and q. Once the 
normal at the matched point is computed, the constraints are propagated to the neighboring 
points within the match region. 

In contrast to the previous approach, Aloimonos and Bandopadhay (1987) use the full 
optical flow between two consecutive frames and recover both the shape of the object and the 
light source direction. Their formulation assumes that the images are viewed under orthographic 
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projection, that the surface has a Lambertian reflectance function with a constant known albedo 
and that the retinal correspondence, i.e. the optical flow, is known. The source direction can 
only be computed assuming a planar surface, while the computation of the shape only requires 
the surface to be smooth, with the added strong restriction that the shape is known exactly at 
the occluding boundary and that the normal at that boundary is parallel to the image plane. 

1.2.5.2 Motion Recovery Using Shading 

In a recent paper, Nagel (1989) introduces a new constraint equation that incorporates a shading 
term and discusses other extensions to the classical constraint equation. The stated goal of the 
paper is to derive a constraint equation using perspective projection, differential geometry and 
radiometric information. Unfortunately, the results fall short of the goals. His derivations, using 
differential geometry, tend to be more complicated than the traditional ones without increasing 
the intuitive feel for the results. It is a little surprising that he presents, in great detail, the 
derivation of the discrete motion equations, that require the solution of the correspondence 
problem, and then proceeds to differentiate the equations, obtaining the classical optical flow 
equations, and only uses the latter in the subsequent derivations. 

More serious problems with his approach are the use of Schunck's divergence equation 
(Schunck 1985) and his choice of a shading model. In the former instance, he fails to recognize 
that Schunck's equation is only applicable to texture density and is wrong in the context of an 
image brightness constraint equation. Contrary to his claim, his shading model has nothing to 
do with a Lambertian reflectance model of the surface but only relies on the well known optical 
imaging cos 4 a law where a represents the angle of the incident rays entering the lens assembly. 
His model is inaccurate, since, in practice, any well-built system can be calibrated to eliminate 
the cos 4 dependence and this phenomenon is negligible compared to the vignetting effect 1 that 
occurs in real lenses. 

All the problems outlined in this summary, compounded by the fact that Nagel does not 
even attempt to solve his model, clearly reduce the usefulness of his contribution. 



'Vignetting is the reduction of the light gathering process caused by the partial occlusion of inclined incident 
rays by components of the lens assembly system. 
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1.3 Machine Vision and Image Processing 

Strong competition exists between the machine vision and the image processing communities. 
Each camp advocates that their algorithms are more robust or realistic in solving the same 
problem i.e. recovering the motion between two frames of a video sequence. Much of the 
problem is the failure to recognize that, until recently, the two communities were solving different 
problems. 

Machine vision algorithms are trying to recover the true three-dimensional motion of the 
objects and use this information in path planning, collision avoidance or robot navigation, while 
the image processing algorithms, especially in the image coding field, only try to compute a 
two-dimensional field that reduces the entropy of the residual error signals between the original 
images and the coded one. The computed field may or may not have any relationship with 
the optical flow depending on the methods used, but is a valid field as long as a coding gain is 
obtained, compared to the non motion-compensated coding schemes. 

More recently, there has been a demand for the computation of true motion fields for 
television imagery and their use in motion-compensated interpolation, where the goal is to 
reconstruct as faithfully as possible the missing frames from temporally and spatially sampled 
frames. The newest algorithms are very close to the traditional machine vision optical flow 
algorithms. The new requirements have, in essence, started to bridge the gap between the 
image processing and machine vision communities. The former group has not yet started to use 
and compute rigid body motion since typical broadcast television imagery is not composed of 
rigid bodies. However, such an analysis is relevant when it comes to estimate and compensate 
for the motion of the camera with respect to the scene. The difficulty in that situation is that 
the scene is not static and it might be difficult, or impossible, to separate the rigid motion of 
the camera from the individual motions of the elements of the scene being imaged. 

1.4 Goal of the Thesis 

In this thesis, we investigate the problem of combining shading and motion information to re- 
cover the structure and motion of a parameterized rigid patch with respect to a fixed observer, 
from a sequence of image irradiance The goal is to recover the three-dimensional motion by 
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observing the image irradiance, its temporal and spatial derivatives and by assuming, or esti- 
mating, a shading model for the parametrized surface, given a set of lighting conditions. More 
specifically, the goal can be refined to the following three components: 

1. Establish a theoretical and computational framework to incorporate the a priori knowl- 
edge of shading conditions and use it, in conjunction with the motion cues, to refine the 
structure and motion solutions and extend them to previously difficult cases under the 
traditional assumption of brightness constancy. 

2. Evaluate qualitatively and quantitatively the importance and relevance of shading infor- 
mation with respect to the added complexity of the implementation and to the accuracy 
of the solution. 

3. Extend the analysis to longer sequences to use efficiently the additional data and improve 
the noise performance of the solutions for real data. 

Rigid body motion of objects with respect to a fixed observer is considered, as opposed to 
the problem of passive navigation, which considers rigid body motion of the camera with respect 
to a fixed background. In the former case, an object based reference frame is usually chosen 
and the motion of the camera, or observer, evaluated with respect to the stationary scene and 
light sources. In the latter case, a camera based reference frame is usually considered and the 
motion of the rigid bodies is expressed with respect to the camera. The latter distinction is 
very important because, in the case of passive navigation, with objects that have a reflectance 
function independent of the viewing direction, there is no motion induced shading variations. 

1.5 Summary of Results and Contributions 

A new formulation for recovering the structure and motion parameters of a moving patch is 
presented. It is based on the spatiotemporal derivatives of irradiance that are computed from 
a time-varying irradiance sequence and combined into a differential constraint equation. The 
new approach determines the rigid body motion and the structure of the patch directly from 
the irradiance sequence using both motion and shading information. 
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First, a new constraint equation (CE), the full irradiance constraint equation (FICE), is de- 
rived. It links the spatiotemporal gradients of irradiance to the motion and structure parameters 
and the temporal variations of the surface shading. This equation separates the contribution 
to the irradiance spatiotemporal gradients of the gradients due to texture from those due to 
shading and allows the FICE to be used for textured and textureless surface. 

The theory of the new CE is first developed for a generic surface and an arbitrary shading 
model; the minimization equations that define the motion and structure parameters, are derived 
in the general case. The models are subsequently specialized and various implementation are 
given for planar and quadratic patches; various shading models and numerical examples, that 
use synthetic and real data, are presented for different combinations of surface, light source 
geometry and surface reflectance. 

The FICE formulation is extended to multiple frames, and several methods are presented 
for efficiently computing the structure and motion parameters directly from a sequence of data. 
Several examples that focus on the behavior of the multiframe algorithms in the presence of 
noise are described. In particular, it is shown that the accuracy of the results is greatly improved 
in the presence of noise with the multiframe implementations. 

Overall, the examples demonstrate the superiority of the FICE algorithms to the classical 
CE algorithms in two distinct areas: the accuracy of the results is higher for textured surfaces 
and a solution can be determined in the case of textureless surfaces. 

In this thesis, at least three major contributions to the problem of recovering the structure 
and motion parameters were made: a constraint equation that allows the concurrent use of 
motion and shading information is formulated; a study of the importance of shading in rigid 
motion estimation is presented and situations in which the classical CE is too inaccurate to 
be used is determined; the FICE is expanded to multiple frames and a global formulation is 
provided. In addition to these contributions, insight is gained in the solution of nonlinear 
system of equations and in the challenge offered by real data. 

Prior to this work, there was no attempt to bridge the fields of motion estimation and shape 
from shading and a single algorithm or class of algorithms could not recover the structure and 
motion parameters from a moving patch with an albedo that can either be constant (textureless 
surface) or arbitrary but smooth. In addition, there was no real understanding of the domain of 
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validity of the classical CE that was used indiscriminately in all situations. Our work presents 
evidence that, although the classical CE is a very useful and excellent approximation in many 
cases, in other cases it is significantly in error and greatly improved results can be computed 
with the novel formulation. 

1.6 Thesis Overview 

Because the use of shading information and multiple frames is decoupled, the theory of shading 
information within a constraint equation and the of multiple frames is presented separately. 
These algorithms are independent and can be used separately or together. Algorithms us- 
ing shading information can be used with a minimum of two frames while algorithms using 
multiple frames can use the conventional constraint equation as described in Negahdaripour 
(Negahdaripour and Horn 1987). 

In Chapter 2 an overview of the image formation process and representation of motion fields 
is presented. The Full Irradiance Constraint Equation (FICE) and the incremental FICE are 
derived and an extension of these equations to second order derivatives presented. Presently 
available numerical methods to solve systems of nonlinear equations are surveyed and the prob- 
lem of efficient and robust computation of spatiotemporal irradiance derivatives treated. In 
Chapter 3, different shading models, the regular constraint equation, as derived by Negah- 
daripour (Negahdaripour and Horn 1987), and the FICE model are examined and discussed. 
Specific planar patch implementations and examples are featured in Chapter 4, while extensions 
to quadratic patches and corresponding examples are presented in Chapter 5. In Chapter 6, 
three multiple-frame constraint equations are derived and examples are shown. Finally, a 
summary of the results and the conclusions are presented in Chapter 7. 
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Chapter 2 



Problem Formulation for Two 
Frames With Shading 



Image formation produces a sequence of irradiance patterns from snapshots of a real three- 
dimensional object illuminated by light sources and moving in space. Motion fields mathemati- 
cally represent the motion of the image of the object and shading characterizes the "appearance" 
of the object interacting with its lighting environment. This chapter analyses the image forma- 
tion process, the representation of the motion of an object and the shading variations induced 
by motion, and presents algorithms that compute the structure and motion parameters from 
the irradiance sequence. 

The two-frame case is described in a generic way that does not rely on a specific object 
shape or shading model and introduces all the tools and mathematical techniques needed for 
the study. Algorithms are built incrementally from a simple, purely two-frame algorithm that 
uses motion cues and shading information to recover the motion and structure parameters, to a 
more complex formulation that recovers incremental motion from a stack of frames. The overall 
purpose of this chapter is to provide the theoretical framework and to set up a computational 
framework that is shape and shading model independent. 

33 



34 CHAPTER 2. PROBLEM FORMULATION FOR TWO FRAMES WITH SHADING 

2.1 Image Formation 

An analysis of the principles of image formation is important to the understanding of the 
methods for recovering structure and motion from brightness images. The analysis reveals two 
main aspects of the image formation process, the geometric process by which a point from the 
three-dimensional surface projects onto a point in the image plane, and the photometric process 
that determines the irradiance of the points in the image plane. 

2.1.1 Geometric Model 

The analysis of a real lens-based camera is complicated. For regular focal length and normal 
to wide angle lenses, the actual lens can be approximated by a pin-hole camera placed at a 
distance equal to the effective focal length F of the lens from the image plane. Since light 
travels in a straight line from a point in the object surface to a point in the image plane, the 
transformation from the three-dimensional world to the projected two-dimensional image is a 
perspective transformation defined by 

L = JL (2.1) 

F Rz 

where R = (X,Y,Z) T represents the coordinates of the 3-D point in the world frame (camera 
based coordinate system), r = (x,y,F) T the coordinates of the projected point and z is the 
unit normal on the optical axis (see Figure 2.1). The 3-D vector r represents the projected 
point in the image plane and the 2-D vector x = {x,y) T denotes the coordinates of the image 
point in the image plane, i.e. x is the restriction of r to the two-dimensional subspace of the 
image plane. 

Although only normal or wide angle lenses are taken into consideration, and therefore only 
the perspective projection model is used, it should be noted that zoom and telephoto lenses can 
under certain circumstances be nicely and more easily approximated by orthographic projection. 

An immediate consequence of the perspective projection equation (2.1) is that an image 
point with coordinates r can be computed uniquely from the coordinates R of the corresponding 
object point, but a unique object point cannot be determined from a given image point unless 
the distance along the ray is known (structure), since any point on the ray can be backprojected 
to the original point. The problem of recovering R from r is also known as the inverse optics 
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Figure 2.1: Perspective transformation 

problem. It is an ill-posed problem that cannot be solved unless additional assumptions or 
constraints are introduced. 

Pin-hole cameras have problems; if the size of the pin-hole is infinitely small, diffraction 
occurs, if the size is too big our analysis no longer holds and a point on the object is imaged as 
a small circle. Lenses overcome these problems to a large extent; they gather a finite amount 
of light while keeping the object surface in sharp focus. 

2.1.2 Photometric Model 

We will briefly show how to compute the image irradiance E, also informally called image 
brightness, and introduce the concepts of scene radiance and surface reflectance. 



2.1.2.1 Computing Image Irradiance 

The image irradiance, or apparent brightness of the surface, depends on the microstructure of 
the surface, the distribution of the incident light and the orientation of the surface with respect 
to the viewer and the light sources. It is formally defined as the power per unit area of radiant 
energy falling on a surface. The scene radiance L, or informally scene brightness, is defined 
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Figure 2.2: Image formation for a lens based optical system 

as the power per unit foreshortened area emitted in a unit solid angle, and is not an intrinsic 
property of the surface. It depends on the reflectance property of the surface and can change 
with motion and illumination. 

Let us consider an object patch SO that maps into the image patch SI through the lens (see 
Figure 2.2). Since rays passing through the center of the lens are not deflected, the solid angles 
Cl$ j and tig o subtended at the center of the lens by SI and SO are equal, i.e. 



Sin 



SI cos a 



Hso = 



SO cos e 



(F/ cos a)» -" ( (R ., )/cosa y 

where a is the angle between the incident rays from the light source and the optical axis, and 

is the angle between the surface normal and the rays at SO. Taking the ratio of the solid 

angles, we obtain 

£2 = /coBax /_F_V (22) 

SI Vcos^/VR-z/ 

Let L be the radiance of a point of the patch SO on the object surface. It is the result of 

the light from the sources of illumination being reflected from the surface of the object and 

depends upon the location and nature of the sources of illumination, and upon the orientation 

and reflectance of the surface. The amount of light SL emanating from the surface patch SO, 
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and concentrated on the image patch SI, is given by, e.g. (Horn 1986) or (Horn and S job erg 
1979), 

SL = L60 cos 07T ( p^r ) cos 3 a, (2.3) 

where d is the diameter of the lens. If we neglect the losses in the lens, the image irradiance E, 
defined as the ratio of SL and SI, can be expressed as 



= W. 



b - 1 '*{tr) cos ' a (2 ' 4) 



using (2.2) and (2.3). Equation 2.4 is one expression of the well-known cos 4 optics law. Very 
often the quantity F/d is used as a direct specification of a lens, and is called the effective 
F-number. In practice, systems are often calibrated to eliminate the cos 4 dependence. 

2.1.2.2 Scene Radiance and Surface Reflectance 

Radiance is a directional quantity that depends on the direction of the illumination, character- 
ized by the polar angle $i between the normal to the surface n and the direction of the rays 
L, and by the azimuthal angle <j>i between the perpendicular projection of L and any chosen 
line in a plane perpendicular to n. Nicodemus (Nicodemus et al. 1977) introduced the Bidi- 
rectional Reflectance Distribution Function (BRDF) that relates the incident irradiance SE in 
the direction (0i,<f>i), to the reflected radiance SL in the direction (0 e ,<f> e ) (see figure 2.3 for the 
notations). The BRDF is defined as 

MWW,**,*.)-*^ (2.5) 

and it determines how bright a surface illuminated in one direction appears when viewed from 
another direction. If the illumination source is distributed over a range of directions (#;,<&) 
(extended source), the total reflected radiance is obtained by integrating equation 2.5 over the 
solid angle of incidence. 

The BRDF is a complete description of the relationship between scene radiance and image 
irradiance but is too complex to use since, in practice, it needs to be determined experimentally 
for each type of surface. Fortunately, there are some types of theoretical surface reflectance 
that can be easily modeled and that approximate some actual surfaces nicely. We will present 
two of these surfaces, the ideal Lambertian diffuser and the ideal specular reflector. 
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Figure 2.3: Local geometry of incident and reflected rays for the definition of the bidirectional 
reflectance distribution function (from figure 6 of Horn and Sjoberg (1979)). 

An ideal Lambertian surface is a perfect difluser that appears equally bright from all viewing 
directions and reflects all incident light, absorbing none. Although no real surface exactly 
behaves like this, tightly pressed powders of highly transparent material like barium sulfate 
and magnesium carbonate come close. Matte white paint, opal glass, rough papers and snow 
are somewhat worse approximations. The BRDF of an ideal Lambertian surface has a constant 
value 1/w and the radiance of a surface illuminated by a single point source is given by 

L(6i, fc) = - cos 6iE = -(L • h)E. 

7T X 



The ideal specular reflector, very well approximated by polished metals, reflects all the light 
arriving from (#i,<fc) into the direction {9i,<j>i + x). Its BRDF can be expressed as 



BRDF ipeculaT (e u <t>i;O e ,<l> e ) = 



6(0 e - 9j)S{<t> e - 4>j - t) 
sin $i cos 0{ 



where S() represents the Dirac function. The scene radiance relates to the image irradiance 
simply by, L(0 e , <f> e ) = E(0 e , <f> e - tt). 



2.2. IMAGES, MOTION AND MOTION FIELDS 39 

2.2 Images, Motion and Motion Fields 

This section discusses the representation of three-dimensional motion, two-dimensional motion 
fields and their relationship. The special case of rigid body motion is developed in some detail 
and the relationship between motion and observed image sequences is discussed. 

2.2.1 Representation of Motion 

2.2.1.1 Image Motion Field 

We saw in section 2.1 that an image is formed by the projection of a three-dimensional scene 
onto the target of the camera. In general, each point in the image corresponds to a unique point 
on the surface of the object. Transparent or translucent surfaces violate that rule and will be 
excluded. The relative motion of an object and the camera results in the motion on the image 
plane of the projections of the object points. The motion field is defined as the two-dimensional 
vector field of instantaneous velocities of points in the image plane as a function of position and 
time and is denoted by v(x, t). These velocities are related to the three-dimensional velocities 
of the corresponding scene points through the perspective projection of the camera. 

The motion will generally induce a spatiotemporal variation in the image plane irradiance, 
and it is the analysis of these variations that allows us to estimate the motion which has 
occurred. It should be noted that the motion is not always observable from the image irradiance 
even though a nonzero motion field might exist everywhere. The case of a rotating textureless 
sphere illuminated by a fixed light source is such an example. The image of the sphere is 
nonuniform, due to shading effects, but the image irradiance does not change with time. 

A related concept, which is widely used, is that of the optical flow, defined as the apparent 
motion of the image brightness pattern (Horn and Schunck 1981). Optical flow is defined only 
in terms of the image and the apparent motion may be due to true motion, illumination effects, 
or both. The optical flow is not uniquely determined by its definition and does not correspond 
to any physical reality. 

When dealing with temporally sampled images a quantity of interest, closely related to the 
velocity field, is the displacement field. It establishes the correspondence between the points in 
the image at time to and the points in previous and subsequent frames at time to - kT, where 



40 CHAPTER 2. PROBLEM FORMULATION FOR TWO FRAMES WITH SHADING 

k e {...,-2,-1,1,2,...} and T is the temporal sampling rate. In the context of this thesis, 
motion field will be used to refer to either a velocity or a displacement field. 

2.2.1.2 Velocity and Displacement Fields 

The projection of each object point traces out a trajectory in the image plane during the time 
it is visible in the image. Trajectories can be specified by the function c(f;r ) which gives 
spatial location at time t of the point which was at the spatial location ro at time to- From 
this definition, it is clear that c(t ;r ) = r . Let us denote by ti(r,t ) the starting time and 
by tj(r,t ) the ending time of the trajectory of the point r. The velocity field gives the rate of 
change of position at a given time for each pixel on the frame i.e. 



▼(*f0= df C ( f 5 r o) 



= v(c(r;t ),*o) 
t=t 



and, as expected the displacement field is related to the velocity field by integration. If 
d(ti;r,*o) represents the displacement of r from the time to to t\ {t\ > to) 



d(*i,r,*o)= / v(c(r;t ),t)dt. 
Jto 



The velocity field corresponds to the projection of the motion of the points in space and 
can be expressed directly in terms of these points. Let R be a point in space with coordinates 
(X, Y, Z) in a reference frame attached to the camera. We saw in section 2.1 that the image 
of the corresponding point in the image frame is r = (x,y,F) = FR/(R • z) where F is the 
effective focal length of the lens and z is the optical axis. The point R is in relative motion 
with respect to the camera with the velocity v = R, where dotted quantities represent time 
derivatives. The corresponding motion of the projection of R on the image plane is r = (x, y, 0) 
(or i = (x,y) T ), if the analysis is restricted to the image plane subspace, and can be expressed 



as 



. _dr _ F dK 



dt R • z dt (R • z) 2 
This derivative can be rewritten in terms of r and rearranged: 






-^(•xfRxr)). (2.6) 

rL • Z 
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2.2.2 Parametric Representation of Motion Fields 

The previous description of the motion field is general and does not place any constraint on the 
local form of these motion fields. In fact, the degrees of freedom of the fields can be reduced by 
assuming a certain parametric representation associated with a restricted class of motion. One 
important subclass of motions is rigid body motion. 

If the point R is assumed to lie on a surface patch which can be considered as a rigid body, 
then the instantaneous motion of all the points of the patch can be described by 



R(i) = R = wxR + t = nR + t, 



(2.7) 



where 



n = 



u> z 



-U> z COy 

-w a 



\ 



v- 



U>u 



Wx 



J 



is the (3 X 3) skew-symmetric matrix isomorphic to the instantaneous rotation vector a> = 
(u> x ,u)y,io z ) T , and t represents the instantaneous translation vector. Substituting equation 2.7 
of the rigid body motion into equation 2.6 gives the expression of the velocity field of the 
images of points on the patch of rigid body in question, in terms of the three-dimensional 
motion parameters 

f=zx (rx -xw- — — - ) I . (2.8) 



( rX (^ XU? "R^))- 



The vectorial equation 2.8 represents the traditional optical flow equation and can be written 
in the more familiar component form: 




-('♦£) 



F 



W-u 



y 




(2.9) 



Appendix A presents a comparison between the equations of this section, which use the instan- 
taneous motion, and the optical flow equations which can be derived directly from the finite 
motion equation. 

Motion fields are most easily visualized by means of needle diagrams, an array of vectors 
indicating the magnitude and direction of the velocity on a selected grid of points (Figure 2.4 is 
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Figure 2.4: Needle diagram of a optical flow field representing a zoom along the optical axis 

an example of the flow field induced by a pure translation along the optical axis) or by a color 
image where the hue indicates the direction and the saturation the magnitude of the field. 



2.2.3 Time- Varying Images and Motion Fields 

Having characterized motion by means of motion trajectories and motion fields, we can now 
relate the observed time-varying images to these motion fields. As seen in section 2.1, im- 
age irradiance E(x.,t) is measured at each point of the image plane and the irradiance of the 
projected image point depends, in general, on the position of the point in space, on the local 
orientation of the patch, on the reflectance of the surface and on the light sources. The fun- 
damental assumption in classical motion estimation is that the image irradiance varies slowly 
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along the motion trajectory. In practice, shading variations have been ignored or considered 
negligible vis a vis the spatial changes due to the texture of the surface. In most cases, no 
quantitative or convincing qualitative explanations have been formulated to justify such an as- 
sumption. In a recent paper, Horn (1988) provides some justification for the brightness change 
constraint equation by showing that, in the case of a sinusoidal grating, it is possible to neglect 
the change due to viewing direction or illumination as long as there is good contrast at high 
spatial frequencies. Section 3.3 analyses the validity of such an assumption as a function of the 
texture on the surface. 

2.2.3.1 Classical Motion Constraint Equation 

Assuming that the irradiance of a patch remains approximately constant as the surface moves 
in the environment, that is dE/dt = 0, the frame difference signal can be related to the dis- 
placement and to the spatiotemporal gradients of the image intensity. Limb and Murphy (1975) 
and, to a greater extent, Cafforio and Rocca (1983) were the first to use implicitly what Horn 
and Schunck called the constraint equation (CE) that can be expressed in the form (Horn and 
Schunck 1981) 

E x • x + Et = 0. (2.10) 

In this equation E x = -r— - 1 -z— > ~z— I represents the spatial gradient of the irradiance E 

dm \ ox ay J 

with respect to x = («,J/) T , the image coordinates of the perspective projection of point R 

dE 
onto the image plane at effective focal distance F, and E t = -^- is the temporal gradient of 

the irradiance. 

Although many, more or less correct, derivations of the CE have been presented, the above 
formulation, to be meaningful, corresponds to tracking a specified point R = (X,Y, Z) on the 
3-D surface, or equivalently, if the point remains visible throughout the motion, its projection 
r = (x,y,F) T , or x = {x,y) T on the image plane and to consider its temporal variations. 
Let us consider the projected trajectory specified by the function c(t;r ). The value of the 
image irradiance JS(x,t) along the trajectory c(t;r ), E(c(t;r ),t) can be considered as a one- 
dimensional time signal denoted s(t\ r ). Then, the assumption of no change in intensity along 
the motion trajectory from the initial time t{ to the final time t/ is 

s(t;r ) = E(r ,t ), U(r 0t t ) < t < t f (r ,t ) (2.11) 
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and the constraint equation is obtained by taking the derivative of (2.11) with respect to t, 
applying the chain rule and evaluating at t = to. 

The constraint equation can also be interpreted in terms of a directional derivative of the 
brightness function E(x, t). If V a -E denotes the directional derivative of E with respect to the 
arbitrary vector a € R 3 , where R 3 = Rl pat iai X Rtime, then 

where VE is the spatiotemporal gradient of E(x,y,t), i.e. VE = (E x ,E y ,E t ) . 

If we consider the vector a = (x, l) r tangent to the motion trajectory, the directional 
derivative along this vector is given by 

which is equal to zero under the assumption of constant irradiance of the moving patch. 

2.2.3.2 Relationship Between Displacement Field and Image Sequence 

When dealing with a time sampled image, it is more natural to describe the motion in terms 
of a displacement field d = d(t - T;x,i). Under the assumption of irradiance constancy, the 
Displaced Frame Difference (DFD) denoted by D^(x,t, d) will be zero i.e. 

D B (x, t, d) = E(x, t) - £(x - d(i - T; x, t ), t - T) = 0. (2.12) 

A first-order Taylor series of the term E(x-d, t-T) in (2.12), produces an equivalent expression 
to the motion constraint equation (2.10) in terms of the displacement field and the DFD: 

D B (x,t,d) + d.V x ^(x,t-T) = 0. (2.13) 

If we assume that the velocity is constant in the time interval [t — T, t], d = vT = xT and the 
previous equation can be written as 

D£( ^ ,d) + x • V x £(x, t - T) = 0. (2.14) 

Equation 2.14 is equivalent to the motion constraint equation (2.10), as T approaches 0, since 

— — = lim E ^ ' ' — -. In practice, only equation 2.13 should be used when dealing with 
dt r->o T 
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sampled images. The infinitesimal motions need to be approximated by finite displacements 
and the DFD value is a much better approximation to the temporal derivative than the usual 
first difference (see section 2.5). 

A more general form of equation 2.13 is obtained by taking a Taylor series of the irradiance 
function E{x. -d,t-T) about the point (x - d , t - T) 

Dj5(x, t , d ) + (d - d ) • V x £(x - d , t - T) = 0. (2.15) 

Equation 2.15 is identical to (2.13) for d = but may be a better approximation if d is 
close to the true displacement because all the quantities are very small and the equation is an 
excellent approximation to the differential motion constraint equation. These equations will be 
reformulated directly in terms of the rigid body motion parameters in section 2.3. 

2.2.3.3 Constraint Equation for Sampled Images 

The constraint equation (2.10), derived in the previous section, only applies to the image as a 
function of continuous space and time. In the following development, the effects of quantization 
on the amplitude of the image irradiance value will be neglected and only the three-dimensional 
spatiotemporal sampling of the image sequence will be considered. Let A u be a sampling lattice 
associated with the intensity image u(x,<) in the 3-D space-time coordinate system. Let 
(T^, TZ,Tt) T be the sampling periods and let x; be a 2-D image point from the sampling lattice. 

rp 

The sampled intensity image is u a {x.i,t) = tt(x,f) where (xj,f) = (kT£,lTZ,mT*) G A u . 

If the spectrum of the continuous image is limited to a unit cell of the reciprocal lattice A* , 
the irradiance signal can be uniquely reconstructed from its samples by means of an interpolation 
formula of the form 

u(x,f) = ]T]u,(xi,T)$i(x- Xi,* -r) 



X.-.T 



where 



Jw 
and W represents the baseband in the image plane. However, if the original image is not suitably 

bandlimited with respect to the sampling lattice, error-free reconstruction is not possible and 

aliasing will occur. Spatial aliasing can be easily avoided by low-pass filtering the analog video 

signal before spatially sampling it. The main problem, in a regular video camera, is the temporal 
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sampling. Let us consider a 3-D signal u(x,t) obtained from a 2-D signal it (x) by uniform 
translation^ velocity v, u(x,i) = u (x - v(t - t j). The 3-D Fourier transform of u(x,t) is 
related to the 2-D Fourier transform of u (x)by the relation 

r[u(ut 9i v v ,wt)] = f [u (« I1 w„)]e- (w '"' + "' ,, ' ) ' ^A +w t% +w t ). 

If the signal u (x) is bandlimited to (|ri x |,|fi y |) then the signal u(x,t) is bandhmited to 
|fl x ,ft y ,fit| where 

Sit = Sl x \v x \ + fi> y |, (2.16) 

and temporal aliasing is avoided if 

\<U\ < n.K\ + tly\vy\. (2.17) 

Equation 2.16 shows that in the simple case of an object under uniform translation, the tem- 
poral bandwidth il t is a linear function of the magnitude of the translational velocity v and 
therefore, for velocity magnitude ||v|| greater than one, the signal should be sampled fast enough 
temporally to avoid temporal abasing. The temporal anti-aliasing condition described by equa- 
tion 2.17 is never met, or even approached, in present day video cameras and temporal aliasing 
will always be present. 

In practice the continuous signal u(x,f) is not reconstructed exactly from the sampled 
signal u s (xi,t) but u(x,t) = £ u 5 (x;,t)*;(x - x;,i - r) + e(x,f) = u(x,t) + e(x,i) where 
u is the interpolated value of the image irradiance, $ is an interpolation kernel and e(x,t) 
represents the interpolation error functional which may be partially due to abasing or partially 
due to a suboptimal interpolation kernel. The choice of * has a substantial effect on the 
accuracy of the solution. It will be discussed, in section 2.5 in the context of surface fitting 
and irradiance derivative interpolation and computation. For a sampled image, the motion 
constraint equation (2.10), can be expressed in terms of the interpolated continuous irradiance 
function and takes the form 

u t + v • V x u + e t + v • V x e = 0, 

but will usually be approximated by 

u t + v • V X S = 0. 
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2.3 Full Irradiance Constraint Equation 

In the previous section, the motion constraint equation was derived under the assumption of no 
change in intensity along the motion trajectory. In this section, a motion constraint equation 
called the full irradiance change constraint equation (FICE), that takes into account the change 
in intensity along the motion trajectory is presented. Mathematically, the FICE is the equation 
that defines the total derivative of the irradiance function E(r, t), i.e. 

dE(r,t) ^dEdx dEdy BE (21&) 

dt ~ dx dt dy dt dt 

Equation 2.18 will be derived geometrically next, to gain insights in its physical meaning. 

Let us consider a segment of rigid body (S,t ) at time to and the displaced segment (E,*i) 
at time <i = t + dt (see Figure 2.5). Let M(K Q ,t ) be a generic point on (S,t ) that projects 
to r (Ro,*o) on the image plane (II). Let P'(Si,*i) be the corresponding point on (E,*i) that 
also projects onto r , and P(S , *o) the point on (E,*o) such that P' = B(P) where B represents 
the rigid body motion of the surface S. By construction, P(S ,t ) and P'(Si,ii) represent the 
same point on the surface and have the same coordinates (a, /3), called surface coordinates, 
with respect to a coordinate system attached to the surface, the same albedo p = p\(ct,P) , 
but have a different surface radiance due to the different orientation of the patches (E,£o) and 



(S,ti) relative to the light source. If we denote by P(R,t) the projection of the point P of 
space coordinates R at time t, and 2<MP(R,f),tJ the corresponding image irradiance, we have 
the relation 

E(i"(Si,ti),ti) -E(p(Ro,«o).*o) = 

£(P'(Si,fi),*i) - £(m(S ,*o),*o) +£?(m(Ro,«o),«o) - ^(p(S ,«o).*o) • (2-19) 

v „ ' <■ „ ' 

A.E(Si,So) AE(Ro,S ) 

The left hand side of equation 2.19 represents the temporal variations of the irradiance of the 
point 5, of surface coordinates (a, ft), due to the change of orientation of the surface S, i.e. 
E(r , t ) - -E(ri, fi). The first part of the right hand side of the previous equation, A-E(S X , S ), 
represents the change of irradiance due to the motion of the observed point P. Since the 



1 p\(a,0) denotes the surface albedo at wavelength A and surface coordinates (or,/3). In practice, the continuous 
wavelength data are not available and only their integrals with a small number of filter functions corresponding 
to different channels are accessible. 
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P(S ,P) 




Figure 2.5: Relationship between the 3-D surface, the texture on the surface and the projection 
of the surface onto the image plane for the full irradiance constraint equation. 
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points P' and M project onto the same image point ro, AE(Si, So) = E(ro,ti) - E(ro,t ) and 
lim AE(Si,Si) = ^f ' . A£(R ,S ) represents the difference of irradiance on the points 

dt — >0 0t 

Pand M on the surface (S,i ) and lim (aE(R , S ) = E{r u to) - E(r ,to)\ = dr-E Io (r ,t ). 

dt — >0 V ' 

Taking the limit, as dt goes to 0, of (2.19) yields the FICE differential equation (2.18), that can 
also be written as 

E t .r + E t = E(r,t) = E, (2.20) 

where E represent the temporal variations of the shading model. 

Using the expression of f in terms of the instantaneous rigid body motion parameters (u>, t) 
(see equation 2.8), we can rewrite the FICE (2.20) in the form 



Et — v • u> 



s • t 



Rz 



- = E 



where 



/ _ 



s = (E r x z) x r = 



FE X 
-FK, 



\ 



\ xE x + yE y i 



and v = -s x — = 
F 



\ 



yE x - xE y 



(2.21) 



/ 



Equation 2.21 represents the FICE in the case of rigid body motion. Negahdaripour and Horn 
(1987) derived the equation in the constant irradiance case, i.e. when E = is assumed. 



2.3.1 Motion and Structure Equations 

A smooth C p surface can be represented by a p th order polynomial patch that corresponds to 
the truncated Taylor series expansion of the surface in the neighborhood of the line of sight. 
If the viewer frame is oriented so that the z-axis is along the focal axis of the camera, the p' 
order polynomial patch is given by 

d i+j z 



2 (x ,r) = z. + £ iJ £ n £g ] x>r>. 



The reciprocal of the depth can be expressed directly with a p th order Taylor expansion in terms 
of the image plane coordinates x,y. If 0(e p ) denotes higher order terms, the Taylor series of 
\jZ is given by the expression 



l = i__J_ v Ci > j gi+i (v z ) xi y j i C (f) 

Z Z Q Z .{ri x (* + j)\ dxidyi &+> K h 



(2.22) 
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If the nonlinear terms are neglected, the patch is planar and can be represented by 3 
quantities, Z = (0 Z Q ) the intersection of the patch with the line of sight, and Zx and Zy 
that give the orientation of the patch. The normal n to the patch can be expressed in terms 
of spatial derivatives of the depth Z by n = (-Z x - Zy 1) T . A planar patch representation 
can be assumed if the object under consideration is a plane or a polyhedra, or if the curvature 
of the nonplanar patch is small. Since a planar patch is characterized by a single normal, the 
surface radiance will be uniform across the patch, if the shading variations due to a nearby 
light source are neglected and the reflectance function, e.g. Lambertian diffuser, is independent 
of the viewing direction. 

A planar patch, in the viewer frame, can be represented by the equation 

Z = Z + Z X X + Z Y Y «$==► R • n = Z 

^=> ^-±(r-n) = Z (2.23) 

1 r • n 



Z FZ 

i.e. the reciprocal of the depth R • z = Z can be expressed exactly as a linear functional in the 
image coordinates r. 

If the second-order terms are included in the Taylor expansion, the surface is approximated 
by a quadratic patch that is specified by the point Z , the normal n at that point and three 
curvatures that can be reduced to two principal curvatures in an object frame oriented along 
the principal curvatures. Unlike the planar patch case, there is a different normal at every point 
and therefore there are shading variations induced by the curvature of the surface. As a result 
shading information is much richer, but is much harder to use since the normal at every point 
is required. 

2.3.1.1 Equation, Measurement and Parameter Counting 

Let us consider an arbitrary surface patch £ undergoing a rigid motion specified, at time t, by 
the instantaneous, constant, motion vectors t,u> and let us denote a its perspective projection 
onto the image plane. The surface S is specified by a Lambertian reflectance functional and a 
smooth albedo p\(a t /3), and is illuminated by a single light source characterized by it position 
1, in the camera centered frame, and its intensity L . For the purpose of counting equations, 
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parameters and measurements, the surface a is assumed to fully cover the image plane of size 
(n x m). 

The observables are, equivalently, the irradiance values of a sequence of images, the spa- 
tiotemporal derivatives of the irradiance or the irradiance values of the first frame and the 
temporal derivatives of the brightness values in subsequent frames, and we want to recover the 
motion parameters t,u>, the structure of the surface Z(X,Y), or a set of parameters charac- 
terizing the parametric surface patch, and, optionally, the direction and intensity of the light 
source. The problem is specified by two constraint equations, the FICE (2.21) that relates 
the spatiotemporal changes in irradiance to the motion and structure parameters and to the 
shading variations at every point of the projected surface <r, and the shading equation (SE) 
that specifies the value of the image irradiance at every point of <r from the source direction L, 
the surface normal n and the value of the albedo at the corresponding point of S. 

For a sequence of p frames, we observe, in general, px (n x m) independent irradiance values 
from which we can extract at most (n - 1) X (m - 1) spatial gradients and (p - 1) X (n X m) 
temporal derivatives. We need to estimate 6 motion parameters (the components of t and u>), 
4 parameters for the light source (position and intensity) 2 and n X to albedo and n X m depth 
values. Naive counting would associate an unknown value to the depth and albedo for every 
point in each p frames without realizing that, once a depth and albedo is associated with an 
image plane point, or equivalently to a surface point, their variations in subsequent frames are 
driven by the rigid motion and the points are tracked throughout the sequence. In practice, the 
number of unknowns fluctuates slightly since points of the surface disappear and appear at the 
edge of the image plane but this first-order effect can be ignore in global, average counting. The 
previous, naive, count would suggest that a p-frame sequence produces pxnxm independent 
measurements, 6 + 3 + 2nm unknown parameters and could therefore be solved using a minimum 
of 3 frames ! That analysis does not take into account that some parameters cannotbe estimated 
independently, that a sequence of p frames does not yield px nxm independent measurements 
in most cases and that, in practice, the ratio of independent observables to unknown parameters 
is too small to produce a meaningful, let alone robust, solution. Specifically, only the direction 
of the translation vector can be determined since t only appears as t/Z in the FICE, or to put 



2 For a distant source like the sun, only the direction I (2 unknowns) and intensity Lo of the source are relevant. 
For a nearby source, the full position 1 of the source and its intensity are required. 
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it another way, t can only be determined within a global scaling factor. Similarly, the light 
source position 1, or direction 1, and intensity L Q cannot be recovered independently since they 
only appear as a product, L l or L l, in the FICE and SE. Consequently, the intensity of the 
source will be set to unity. 

In order to build a robust solution, all the measurements, i.e. the full image cr, are used in 
the minimization and the number of unknowns is reduced by assuming a parametric surface 
patch, specified by a few parameters. However, parametric surface models not only reduce the 
number of parameters but can also reduce drastically the number of independent measurements. 
In some cases, for specific combinations of surface illumination, source models and geometric 
surface shape, the number of independent measurements is insufficient and the problem becomes 
underconstrained. Specific cases will be discussed in detail in chapter 4 for planar patches and 
in chapter 5 for quadratic patches. 

2.3.1.2 General Minimization Equations 

Given the irradiance values and the two constraint equations FICE and SE, we can either 
(a) minimize the square of the error in the FICE alone, (b) perform (a) subject to the SE 
constraint, (c) minimize a weighted sum of the square of the error in the FICE and the square 
of the error in the SE, or (d) minimize the square of the error in the SE alone. Scheme (a) is 
the traditional least-squares formulation, as used by Negahdaripour (1986) hi the case where 
E = 0, and is only applicable if the changes in illumination are negligible and the spatiotemporal 
variations are purely due to the texture. Algorithm (b) represents the general least-squares 
formulation that allows the determination of the motion and structure parameters for a patch 
with arbitrary smoothly varying albedo. The problem is set up as a constraint minimization 
where the unknown albedo values are determined by the shading constraint that relates directly 
the irradiance values on a to the albedo values on E. The constrained minimization (b) is turned 
into an unconstrained minimization with the introduction of a stabilizer functional derived from 
the shading equation. Scheme (c) is the "regularization" formulation of algorithm (a). Scheme 
(d) only recovers the normal of the surface and ignores the motion information and will not be 
further discussed 3 . Each of the previous minimizations can be performed, when appropriate, 



3 Brooks and Horn (1985) used this approach to recover shape and source information from shading. 
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with added parameter constraints like ||n|| 2 = 1 or ||L|| 2 = 1. These additional constraints are 

required if unit vectors like the normal n or the source direction L are estimated. In general, 

the FICE depends both on the normal and the unit normal and either can be recovered with an 

unconstrained minimization for n or an equivalent constrained minimization for n. Once n or n 

is determined, the other quantity is readily available since n = t— t: and n = - — -. However, in 
' |n|| n-z 

practice, the complexity of the algebra and the numerical implementation, and ultimately the 
numerical accuracy of the estimates, is highly dependent on the type of minimization performed. 
More specifically, if we let 

A =E t -(v.«)-i(s.t)-£ 



II 



A 2 dr 



B =E{x,t)-px(a,l3)e{L,n) 

C = ||fi|| 2 - 1 and D = ||L|| 2 - 1 
the following unconstrained and constrained minimizations can be performed : 



(a) 



vainA 



minA 

subject to B = 0, Vr 6 a 
min(A + aB), a weighting constant 



K I+ IL 



nun 



n(r)Bdr 



as well as the corresponding constrained minimizations that include additional parameter con- 
straints like C and D, 



(a') 



(*') 



ram(A + XC + vD) 



minA 

subject to C = 0, D = 

min-A 

subject to B = 0, Vr € cr 

subject to C = 0, D - 

where A and v are Lagrange multipliers and /z(r) is a Lagrange multiplier function. 



mm 



(a + If (i(r)Bdr + XC + vl)\ (2.24) 
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The minimization equations developed in this section use a generic expression for the recip- 
rocal of the depth, 1/Z = C(«>*)> &nd a generic shading model, E(r,t) = />>(<*, /3)e(L,n) and 
its associated temporal derivative E = dE(x,t)/dt = p A (a,/3)e t (L,ii,u>,t). In general, shading 
depends on the source direction L, and therefore 1 and R, the unit surface normal n and the 
surface albedo p\(a,/3) , assumed to be a smooth functional of the surface coordinates. The 
temporal derivative of the shading equation additionally depends on the motion vectors u? and 
t. For the parametrized patches considered in this work, 1/Z can always be expressed in terms 
of the normal n at a specific point and other optional variables represented symbolically by *. 
For example, n is the unique normal and * is nil for a planar patch, while n is the normal at 
the point of expansion of the Taylor series for the patch, and * represents the curvatures at the 
same point for a quadratic patch. 

The next section presents a symbolic solution to the minimization problem 6' where the 
light source direction L is known. This generic solution outlines the general procedure used 
to solve such a minimization problem and gives an idea of the complexity of the resulting 
equations. Specific minimizations, with explicit structure (1/Z) and shading model, and their 
numerical implementations, will be presented in chapter 4 for planar patches, and in chapter 5 
for quadratic patches. 



2.3.1.3 Generic Solution to a Constrained Minimization Problem 

Let us consider the minimization problem b' with the only parameter constraint C = and let 
us assume that the reciprocal of the depth 1/Z is only dependent on n, i.e. 1/Z = C( n )- If we 
let (?(u>,t, n) = E t - (v • u>) - C( n )( s • *)> &' can De written in the expanded form 

£ = imnf[ ((G(u, t, n) - /> A e t (L, n, «, t)) ' + /i(r) (e(t, t) - p x e(L, n))J dr + A(||n|| 2 - l) 

(2.25) 

For an extremum of £ we must have 



d£ n d£ „ „ dS n 

— = 0, -57 = 0, and — = 
ou> at on 
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that result in the three vectorial equations 4 

// ( V + PA fe) (<?(«.*'*)- *»«)* = ° < 2 - 26 ) 
Jj (^(n)s + p x ^j (G(u,,t,n)- PA e)dp = (2.27) 

where I3 represents the 3x3 identity matrix. 

The Lagrange multiplier A and the Lagrange multiplier functional fi(r) can be eliminated 
from (2.28) by talcing the dot product of the vectorial equation with two vectors (e.g. n and L) 
and by solving the resulting linear scalar system in the unknowns A and / / /x(r). The choice of 
the vectors, used in the previous dot product, is shading and structure dependent and produces 
a more or less complex linear system. Once the system is solved, the solution is plugged back 
into (2.28) and we are left with a nonlinear system of three vectorial equations in the unknowns 
u>,t and fi. The specific degree of nonlinearity of each of these variables depends on the models 
used but, in general, all the variables will be at least quadratic and only a global, nonlinear 
method can be used to solve the system. In some cases, with some simple shading models, the 
system of equations is partially linear in some of the variables and iterative mixed (linear and 
nonlinear) methods can be used in these instances. 

In all cases studied, the system of vectorial equations is too complicated for a direct vectorial 
solution. The vectorial system is transformed into a scalar system by projecting the vectorial 
equations onto suitable axes. In many instances, the best projection directions are the orthog- 
onal unit vectors of the camera frame of reference (x,y,z). However, substantial algebraic 

simplifications can be achieved with some shading and structure models by using privileged 

8€ dS n , d£ n 

directions. In particular, some of the vectorial equations 7— = 0, 7— = and — - = contain 

an an #L 

rank deficient matrices whose eigenvalues are easy to compute, and projecting in the direction 

of the null space of the matrix and its orthogonal direction, results in significant algebraic sim- 

pUfications. Once the vectorial equations are projected, the system consists of a rational or 

polynomial system of equations in the unknowns u xi u> y , w z ,t xt t y ,t z , n x , n y and n z , for a planar 

patch for example. This system <S can be represented globally, and more generally, by the 

4 For clarity's sake, the explicit dependency of some of the functional on the various variables will be dropped 
when no confusion is possible. 
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vectorial equation f(x) = where f = (/i.../„) r is the vector of nonlinear functionals in the 
unknowns x = (x 1 ...x n ) T G R n . In the next section we review some of the methods available 
to solve nonlinear systems and discuss their drawbacks. 

2.3.2 Discussion on Numerical Methods for Nonlinear Systems 

The general characteristic of all available methods for solving nonlinear system of equations, 
or a single nonlinear equation for that matter, is the recursive nature of the solution and the 
lack of guaranteed global convergence. The convergence properties vary from algorithm to 
algorithm. Very often the radius of convergence is fairly small and these algorithms perform 
poorly for rough surfaces and fail to converge to the global minimum unless the initial estimate 
is close enough to the solution. This problem is exacerbated by the high dimensionality of the 
parameter space and the degree of nonlinearity. 

Two main classes of methods are used in solving such systems: a minimization approach 

n 

where the system is solved by least-squares minimization, miny(x) = ^ fa (x), x £ R n and 

i=i 
a direct approach where the system is solved recursively while keeping the structure of the 

equations intact. 

2.3.2.1 Minimization Techniques 

Minimization techniques can be decomposed into several classes of methods: descent methods, 
conjugate-direction methods and Gauss-Newton type methods. Descent methods for </(x) are 
algorithms for which the iteration decreases the function value at each step i.e. 

9(* {k+1) ) < S(x«) (2.29) 

where flf(x) : R n — ► R, is a real-valued function. The general form of the iteration is 

x (*+i) = X W _ a*^ 1 V x 0(x<*>) 

where C;, is a symmetric, positive definite matrix. A simple descent method is damped New- 
ton's iteration where C fc is the Hessian of #(x). The damping constant afc varies the step length 
and is chosen in accordance with (2.29). A simple and widely implemented method is steepest 
descent where C^" 1 is the transpose of the Jacobian of g. Steepest descent chooses the direction 
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of maximal local decrease of g for each iteration. It breaks down if the initial estimate lies 
in a neighborhood of a nonzero minimum of g, and can be very slow. More specifically, in 
the quadratic case g(x) - |x T Qx - x T b, the convergence rate depends on the value of the 
eigenvalues of the positive definite n X n, matrix Q. The convergence rate of steepest descent 
slows as the contours of #(x) become more eccentric and conversely the algorithm converges 
in one step for circular contours. In particular, even if n — 1 of the n eigenvalues are equal 
and the remaining one is a great distance from these, the convergence will be slow and hence 
a single abnormal eigenvalue can destroy the effectiveness of steepest descent. On the other 
hand, steepest descent performs nicely when only a poor initial approximation of the solution 
is available. Levenberg-Marquardt (More 1977) is a composite algorithm that includes a pa- 
rameter controlling the form of the iteration. For high values of the parameter, the iteration 
is like a steepest descent step while for a zero value of the parameter, the method reduces to a 
Newton step. This algorithm provides the speed of the Newton's iteration while avoiding the 
unpredictability of Newton's in the large residual problem where a steepest descent iteration is 
more favorable. Another highly popular method that is much faster than steepest descent is the 
Davidon-Powell-Fletcher (Fletcher and Powell 1963), which uses a variable metric algorithm. 
The second class of methods are the conjugate direction methods. They were initially 
developed for linear system of equations. For the minimization of a quadratic functional g(x) = 
|x r Ax — b r + c, A symmetric, positive definite matrix, the basic iteration is 

X (*+D = x « - a fc p< fe > 

where the p( fe ) are A-orthogonal and a k = (AxW -b)p( fe )/(ApW) pW. The main advantage 
of this method is its speed of convergence. If the function g is quadratic, {x*} converge to the 
unique minimizer of g in at most n steps, where n is the number of equations. 
The third basic method relies on the Gauss-Newton iteration, 

x (fc+i) = x (fc) + s (fc) 

where s^ is defined by the equation 

J(x«) r J(x( fc ))s« = -J(x( fe )) r f(x( fe )) 

and J(x( fe )) = f'(x( fc )) is the Jacobian of f at x^. The main difference between it and Newton's 
is that the Newton model is based on the assumption that g can be adequately modeled by 
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a quadratic function, while Gauss-Newton results from the stronger assumption that f can 
adequately be modeled by an affine function. Augmented Gauss-Newton methods exist in 
which part of the Hessian matrix is computed exactly and part is approximated by a secant in 
a fashion analogous to the Davidon-Fletcher-Powell algorithm. 

In our experiments, we implemented a modified Levenberg-Marquardt method (More 1977) 
when casting the equations into a minimization problem. This algorithm is reasonably fast, has 
nice convergence properties while avoiding the implementation complexity and slowness of the 
conjugate direction algorithm. 

2.3.2.2 Direct Method for Solving Nonlinear Systems 

Two radically different classes of methods are discussed. The first class of algorithms is based 
on the Newton-Raphson iteration and is closely related to the minimization methods described 
in the previous section. Not too surprisingly, the algorithms' performances are very similar, in 
terms of speed and region of convergence, to their minimization counterpart. The second class 
of algorithms is composed of algorithms based on homotopy methods. 

The basic iteration is the Newton-Raphson, x( fe+1 ) = xW + 8^ where the direction 6^ is 
defined by the linear equations f(x( fc )) + JS^ = where J represents the Jacobian matrix of the 
system of equations f. A common modification is to retain the direction given by 6( k > but restrict 
the length with the introduction of a parameter \( k \ x ( fc+1 ) = x< fe ) + \( k )S( k h The parameter is 
chosen in a way that decreases the value of g at each iteration (descent method). The problem 
with this basic iteration is that the solution may converge to a point that is not a stationary 
point of <?(x). The basic iteration can be modified and improved with the introduction of a 
switch parameter /x. For /z = 0, the classical iteration is unchanged while the direction 6^ ' 
tends to the steepest descent direction for large values of p.. A good implementation of this 
type of method is Powell's hybrid method (Powell 1970). In essence the hybrid method is very 
close to Levenberg-Marquardt but it does not require the explicit expression of the derivatives 
and uses successive values of /»(xW to build up a numerical approximation to the derivatives. 
The method does not impose the reduction of the sum of the squares at each iteration because 
this technique depends on the scaling of g and might cause convergence to a point at which the 
equations are not satisfied. 
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We implemented this method and observed that the results were very similar to the modified 
Levenberg-Marquardt minimization method in terms of speed and convergence but turned out 
to be slightly more flexible when dealing with mixed systems of linear and nonlinear equations. 

2.3.2.3 Homotopy Methods 

Homotopy methods, also known as continuation methods or methods of incremental loading, 
rely on a totally different mechanism to solve nonlinear equations. They construct a continuous 
map $ (the homotopy) from a simple function s(x), with known zeros, to the function f(x) 
with the unknown zeros. More formally, let B be a Banach space and let s and f be denned on 
B. The homotopy map is defined by 

$ : B x [0, 1] — > B 

(x,A) — > $(x,A) 

such that $(x, 0) = s(x), $(x, 1) = f (x). Then, by solving the equation $(x, A) = in B x [0, 1], 
one attempts to move from the known zero of s(x) (at A = 0) to the unknown zero of f(x) (at 
A = 1). There is a considerable amount of theory concerning when this procedure will work 
(Ortega and Rheinboldt 1970) but in general, moving from a zero of s(x) to a zero of f (x) may 
or may not be possible. For the calculation of fixed points, the supporting theory is much more 
satisfactory. Chow (Chow et al. 1978) have proved that computing a fixed point of f merely 
amounts to tracking a smooth zero curve of p a (A, x) = A(x - f (x)) + (1 - A)(x - a), where the 
index a represents the start of the curve, and global convergence is guaranteed with probability 
one. The same algorithm can be used to find zeros also, but then the global convergence is not 
guaranteed. The homotopy map for zeros is /> a (A,x) = Af(x) + (1 — A)(x - a). 

The earliest implementation of a similar, simplified, algorithm was done by Deist and Sefor 
(1967) but the idea underlying the method was first described by Davideko (1953). The success 
of their method largely depends on the cleverness of the implementator: given a set of nonlinear 
equations, one has to modify it to a form that can be handled analytically by introducing a set 
of parameters. These simpler equations, of which a solution can be found, are then changed to 
their original form while simultaneously tracing the roots. The solution amounts to the solution 
of a first-order system of ordinary differential equation. Their approach is less systematic than 
the homotopy mapping and only really works well for systems of equations that can be simplified 
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by a single parameter. 

Our implementation is based on the homotopy algorithm described by Watson and Fenner 
(1980). It produces nice results in some cases but at a very high computational cost. The 
algorithm needs to perform fairly precise numerical integration in the root tracing process. 

2.3.3 Dynamical Frame Unwarping Incremental FICE 

The full brightness equation (2.21), developed in the previous section is afflicted by several major 
problems such as lack of flexibility and extendibility to multiple frames and poor performance 
for widely separated frames. The FICE was derived in the two-frame case and is based on the 
differential motion occurring between these two frames. However, in practice, there is no control 
on the frame rate and the resulting motions are far from being infinitesimally small. In addition, 
we want to compute the motion parameters at any instant in time (not necessarily aligned with 
a given input frame) and we want to easily extend the constraint equation to multiple frames. 
The first issue can be addressed by the use of multiresolution methods where the brightness 
images are filtered and spatially sampled to contract the motion. The second problem can, 
apparently, easily be solved by brightness values and gradient temporal interpolation 5 although 
a constraint equation formulation with a flexible virtual reference frame is easier to understand. 
The last and most important problem, the use of multiple frames in conjunction with the FICE, 
is unsatisfactory with the previous FICE, as shown in chapter 6, and a more sophisticated 
constraint equation is required. Most of the difficulties outlined here can be resolved with the 
use of a Dynamical Frame Unwarping (DFU) type of constraint equation, which is presented 
next. 

The DFU scheme makes use of a displaced frame difference (DFD) type of quantity (such 
a DFD was introduced in section 2.2.3.2) in conjunction with an incremental computation of 
the motion parameters. In the incremental scheme, only the difference between the current 
estimates of the motion parameters and the true parameter is computed. Equation 2.15 is an 
example of such a formulation in the special case of optical flow computation from two known 
frames at time t and t + T, under the assumption of brightness constancy. The proposed DFU 
is best described in the context of optical flow recovery at time t from two successive arbitrary 



The general subject of interpolation and surface fitting will be discussed in section 2.5.1. 
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Figure 2.6: Geometry of velocity field frame with respect to the preceding and following bright- 
ness frame 

frames at times t\ and t x + AT. The shading variation effects due to surface motion will be 
discussed throughout the presentation of the optical flow model, while the specialization of the 
equations to rigid body motion will be treated last. 

Let A u be a sampling lattice characterized by the sampling vector (T£,TZ,Tl) and as- 
sociated with the given brightness frame u(x.,t) and let A r be characterized by the vector 
(T£,T£,T£) the sampling lattice associated with the estimated velocity field frame. Let 

r = — - [™J be the normalized (with respect to the interframe distance T,) temporal dis- 

r r 

tance between the motion field, at time t, and the preceding image (see Figure 2.6). With the 

definition of t, the preceding image is indexed in time by <_ = t - tT* while the following image 
is indexed by t + = t - (1 - r)T$. 

The motion field d t , estimated from the image fields u(x,t + ) and «(x, f_), is computed 
at every point (x;,f) of the A r lattice. Assuming linear motion trajectories, the 2-D vector 
field of displacement is defined by d t = {d ( (x;,f), (xj,f) G A r } and is such that V(xj,t) G 
A r , d t (xi,t) is the displacement between the point (x* - Td t (x;,£),f_) in the preceding frame 
and the point (x; + (1 - r)dt(x t -,i),f + ) in the following frame. The displaced frame difference 
D(xj,f, r, d t (x,-,<)) is defined by 

D(xi,i,r,d t ) = tt(xi + (l -r)d t (xi, <),<+)- u(xi - rd^x,-, <),<_) 

where u(xj,ffc) denotes the brightness value at site (x{,£fe) $ A u retrieved by spatial interpola- 
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tion. The DFD represents the unwarped difference between the preceding and following fields. 
Conceptually, the preceding and following fields are realigned so that every point in the same 
spatial location matches. If the unwarping is perfect, the DFD only represents the variation in 
shading due to the change in orientation of the 3-D point X; that projects identically into Xj 
in the frames i_ and t + . Under the constant shading assumption, the DFD is identically zero 
for perfect unwarping. 

Let c\( n )(xi, t) be an estimate of the displacement field at the n th iteration. Using afirst order 
Taylor expansion of u(x;,*), D(xi,i,r,d t ) can be approximated by the known D(x;,t,T,d( n )) 
and the spatial derivatives of u(xj,£). In fact, 

D(xi,t,r,d t ) = fi(x i + (l-r)d(")-(l-r)(dH-d t ),f + )- 

u(x,- - -rd< n ) + r(d(") - d t ),*_) 
= D( Xi ,t,r,d(")) - (d(") - a t )(rV x u(x,- - Td< n >,<_)+ 

(1 - r)V x n( Xi + (1 - r)dW,* + )) + 0(||d<»> - d t \\). 

At the optimum, i.e. when d t = d t the true motion field, the DFD — D(x;, t , r, d t )-is zero under 
the brightness constancy assumption and is equal to E, the temporal variation of shading, in 
the general case. In the optical flow case, the DFU full irradiance constraint equation takes the 
global form, 

D(x i ,i,T,dW)-(d(")_d t )(rV x ii(x l -TdH,f_) + (i_ r )v x ii(x i +(l-r)d(' l ),« + )) = E. (2.30) 

Equation 2.30 can be rewritten in the form 

E = D(x,-,<,r,d("))-(d(")-d t )(v x ii(x i + (l-r)d( n ),f + )+ 

r(v x ii(xi-TdW,t_)-Vxfi(x< + (l-r)dW,* + ))j 

= D(x i .,t,r,aW) - (dW - d t ) L - r°^pj (V.^x, + (1 - r)4W,t + )) 

~ D( Xi ,*,r,d(")) - (dH - d t )(v x fl(xi + (1 - r)d("),« + )). 

The latter approximation is valid for a motion field d t - (dx t ,dy t ) that is either slowly varying, 
i.e. || Vda;t|| and || Vdy t || are small, or locally translational or constant. The relationship between 
gradients and Hessians of corresponding points is presented in appendix B. 
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The preceding development makes it quite apparent that the concept of a dynamically 
unwarped constraint equation is not limited to two frames. Under the assumption of constant 
motion (or linear trajectories in the optical flow case), it is fairly obvious how to define a DFD 
with as many unwarped frames as wanted. Once un warped by their respective displacement 
with respect to a central, possibly virtual, frame, all the frames are spatially aligned and can be 
stacked without any problem. If the alignment is perfect, the only irradiance difference between 
each unwarped frame is due to the underlying changes in surface orientation that can readily be 
computed from the rigid body motion parameters. The multiframe DFU constraint equation 
is developed in detail in chapter 6. 

On the one hand, the DFU concept is very attractive because of it extendibility and flexibility 
as well as its increasing accuracy since, at every step, the quantities are smaller and the DFU 
equation is a better approximation to the original differential equation. On the other hand, 
the computing burden is far higher since, at every iteration, gradients and brightness values at 
different locations not belonging to the image sampling lattice need to be computed. Section 2.5 
will describe how such computations can be achieved in an efficient manner. 

Having derived a DFU constraint equation for the optical flow case, our attention will be 
restricted to the direct motion estimation and similar results derived in the rigid body motion 
case. 

Let s(r,d t ,t) = (V x u(x + d t ,t) X z) X r and v(r,d t ,*) = s(r,d t ,t) X r and let (u>,t) 
represent the true motion parameters, (u>,t) the estimated parameters and (o?( n ),t( n )) the 
estimated parameters at the n th iteration. With this notation, the dynamical frame unwarping 
constraint equation (2.30) can be written, for the rigid body motion, in the form 

D( Xi ,f,r,d(")) + (rv(xi,TdW,*_) + (1 - r)v(x i ,(l - r)d("), *+))(<» - w)+ 
^(Ts( Xi ,rdH,t_) + (1 - r)s(xi,(l - r)dM,* + ))(t< n > - t) = E 

and can be approximated, if the field is either locally constant, translational or slowly varying, 
by the simpler expression 

E = D(x i ,<,r,d(")) + v(x i ,(l-r)d("),f + )(a ? (")-u ? ) + is(x i ,(l-T)dH,t + )(tW-t). (2.33) 

Both (2.32) and (2.33) require the computation of the v and s fields at each iteration 
since both fields depend on the current estimate of the displacement. The latter equation only 
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requires half as much work as the former equation in computing the fields v and s since they 
are only evaluated at time t+. The efficiency in calculating both fields essentially depends 
on the efficiency in computing the unwarped spatial gradients V x u(x,rd\ n >,t_) in frame £_ 
and V x u(x,(l - r)d( n \t+) in frame t+. Section 2.5 presents a fast method for computing 
the new unwarped gradients when the change in incremental displacement is moderate, that is 
||d( n+1 ) - d(")|| < 1 pixel. 

2.4 First and Second Order Constraint Equations 

In the previous sections, we derived constraint equations that were based on a first-order ap- 
proximation of the irradiance function, or more specifically, only used spatio-temporal gradients 
of the irradiance. In this section, a second-order formulation that uses the gradients and Hes- 
sian of the irradiance function is presented and an approximation of the second order terms 
by a linear combination of the gradients of the preceding and following irradiance frames is 
used. The second-order formulation is first derived for the classical constraint equation case, 
i.e. under the assumption that the temporal variations of the shading dE/dt are zero and non- 
incremental displacements are computed. The new constraint equation is then extended to the 
incremental displacement computation by use of the DFD formulation. It should be noted that 
the equations initially presented in this section are only meaningful for irradiance functions 
continuous in both time and space. The situation is quite different once the equations are dis- 
cretized on a grid. In particular, the choice of stencils or masks, for the computation of the 
spatiotemporal gradients truly determine the order of the discrete equations. This issue will be 
further discussed in section 2.4.3. 

2.4.1 Classical Continuous Second-Order Constraint Equation 

Let u(r,t) represents the irradiance at time t at the image plane location r and u(r + dt,t + dt) 
the irradiance at time t + dt and spatial position r + d t . Assuming that d t is a first-order 
differential quantity, u(r + d t ,t + dt) can be expanded into a second-order Taylor series around 
the point (r, t + dt) 

u{t + d u t + dt) = u{r,t + dt) + V T u(r,t + dt)d t + ^df VV r u(r,* + dt)d t + 0(||d t || 2 ). 
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Assuming that d t represents the displacement field between the frame t and t + dt, and that 
the temporal variations of shading are zero 



u(r, t) = u(r + dt,t + dt) 

= u(r, t + dt) + V r u(r, t + dt)d t + 0(\\d t \ 

Taking the gradient of equation 2.34, we obtain 



(2.34) 



daw 

V t u(r,t) = V t u(r,t + dt) + VV t u(r,t + dt)d t + -^-V r u(T,t + dt) + 0(\\d t \\ 2 ) 

~ V t u(r,t + dt) + VV r u(r,( + dt). 

The latter approximation is valid for a motion field d* that is either slowly varying or locally 
translational or constant. Combining equations 2.34 and 2.35 to eliminate the second-order 
terms and expanding u(r, t + dt) into a first order Taylor series in the variable t results in the 
constraint equation 

^M + \ (V r u(r, t) + V r .(r, t + *)) ^ = 0. (2.36) 

Although the constraint equation 2.36 only requires the gradients of the irradiance in two 
frames, it implicitly contains the second-order variations of the irradiance function and is, 
therefore, more accurate and stable than the classical first-order constraint equation. Second- 
order variations are incorporated into the constraint equation by considering a smoothed version 
of the spatial gradients. The spatial gradients in the preceding or following frame, depending 
on the formulation, are replaced by an average of both spatial gradients. This formulation was 
first introduced by Bierling (1986) in the context of optical flow computation for TV imagery. 
This formulation is a special case of the previously presented un warped equation where the 
constraint equation is evaluated at a virtual frame positioned in between two real frames. The 
DFU approach is more general because it computes the unwarped gradients at an arbitrary 
virtual frame. Equation 2.36 corresponds to evaluating the spatially corresponding gradients 
for a virtual frame situated exactly in the middle of two real frames, i.e. r = 0, d^ = 
in equation 2.30 and the DFD replaced by the temporal gradient of the irradiance function. 
Bierling reported improved stability in displacement estimates using equation 2.36 instead of the 
classical first-order equation. Additional improvements were expected and measured using the 
unwarped gradients because gradients of matching neighborhoods are averaged, unlike Bierling's 
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formulation where gradients of potentially unrelated regions are averaged in the case of moderate 
to large displacement fields. 

The parallel between the two formulations also makes apparent that the dynamical frame 
unwarping constraint equation captures the first and second order variations of irradiance func- 
tion although it was derived without the explicit use of the Hessian of the irradiance function. 

2.4.2 DFU Second-Order Constraint Equation 

Bierling's method can be easily extended to the case of incremental motion by considering the 
second-order Taylor series development of u(r + d^ n \t + di) where d( n ) represents an estimate 
of the displacement field at the n th iteration. A development similar to the one in the previous 
section yields the incremental second-order constraint equation 

D + i(V r u(r + d< n \i + dt) + V r u(r,t))(d - d< n >) = (2.37) 

it 

where the displaced frame difference is defined by D = u(r+d( n \t+dt)—u(r, t) and d represents 
the true displacement. Equation 2.37 is identical to (2.30) where the unwarped gradients are 
replaced by regular gradients. 

2.4.3 Discretized Second-Order Constraint Equation 

Section 2.4.1 presented a continuous second-order constraint equation that is approximated 
by first-order spatial gradients in two consecutive frames. This second-order equation 2.36 is 
a better approximation than the regular continuous first-order equation 2.10 because it takes 
into account both the gradient and Hessian of the irradiance data. However, when dealing with 
sampled data, these equations need to be discretized and the true order of the discrete equation 
depends on the type of stencils used in computing the spatiotemporal gradients. In particular, 
only a naive use of intra frame first difference or centered difference stencils will produce a 
different discrete version of equations 2.10 and 2.36. 

Horn (1986) proposed a consistent way of estimating the spatiotemporal first-order deriva- 
tives by using first differences in a 2 x 2 x 2 cube of irradiance value (see figure 2.7). If the 
indices i, j and k correspond to x, y and t, respectively, the three first partial derivatives of the 
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Figure 2.7: Cube of irradiance values for the estimation of the spatiotemporal gradients at the 
center pixel (from figure 12-7 of Horn (1986)). 



irradiance are obtained by 



E x = 



E v = 



E t 



2Sx \ 2 V Ei+1 ' j ' k ~ ■ B *^ fe ) + (■ e -»'+i.j+i.* _ Eij+i,k)} + 

2 [(Ei+i,j,k+i - Ei,j,k+i) + {Ei+i,j+i,k+i - -E»\j+i,fe+i)J J 

26y\2V Ei ' j+1,k ~ Ei ' j ' k ^ + (- E '+ 1 .i+ 1 . fc ~ £ »+i,i,fe)) + 

2 \{Ei,j+i,k+i - Eij^+i) + (Ei+i,j+i,k+i ~ E i+i,j,k+i)j J 

= 4fo\( Ei <i> k +' 1 ~ Ei ^ k ) + (Ei+i,j,k+i - Ei+i,j,k)+ 

(-E»,i+i,fe+i - E it j +lt k) + (Ei +li j +lt k+i - Ei+ij +1< k+i)j- 



A lot of the problems in estimating consistent spatiotemporal gradients, and in particular 
temporal gradients, are suppressed by the use of the DFU constraint equation that is applied 
directly to the sampled data. 
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2.5 Spatiotemporal Derivatives Computation 

A significant difficulty with differential methods is the requirement for accurate estimates of 
spatiotemporal irradiance gradients from the sampled irradiance sequences. We saw in sec- 
tion 2.2.3.3 that, if the original irradiance image is suitably bandlimited and noise free, an 
error-free reconstruction is possible by means of the interpolation formula 

u(x,t) = ^u,(x;,T)$i(x- Xi,t-r) 

X it T 

where the interpolation kernel $i(x, t) is the product of three ideal low-pass niters </>(a,z, T") 
with impulse response defined by 



(a^2?) = sincQ^(a-t2?) 



where T° represents the sampling period in the a-dimension. 

Once the continuous band-limited signal u(x, i) has been reconstructed in terms of the dis- 
crete samples u,(xi,r), the components of the spatial gradient V r u(x, i) and temporal gradient 

jr-2 — can be expressed by 

ot 

»,T 



t,T 



dy 



9 -^l = ^u,(x i ,TMz,z i ,TZMy,toTZ)j{&TX) 



»|T 



Unfortunately, real irradiance sequences are neither bandlimited nor noiseless. In particular, 
we showed in section 2.2.3.3 that, in practice, the signal is never sampled fast enough in the 
temporal direction, severe temporal aliasing always occurs and the temporal gradient estimates 
will be poor. On the other hand, temporal aliasing is usually not too visible in moving pictures 6 , 
because of the motion blurs. One way of avoiding this problem is to reformulate the constraint 
equation in terms of a displaced frame difference to eliminate the need for temporal gradients 
computation. 



*One notable exception is the highly visible phenomenon of the wagons wheels that are turning backwards in 
movies or on TV. 
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The problem of estimating reliable gradients from sampled noisy, possibly aliased data is 



a solve 




before deciding on a method 
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A wide spread and computationally cheap technique is to approximate the gradients by finite 
differences. In that case, the computation of gradients is equivalent to convolving the irradiance 
images with a filter mask, or stencil, and can be implemented very efficiently since most of the 
stencils used have very small supports. Typically, stencil support for gradient computations 
range from 1 by 2 to 5 by 5. 

An "alternate" solution, very much in favor in recent motion estimation work (Martinez and 

Urn 1986, Knuii 1937), is to parameterize the irrafcf function by fitting a mM§ t9 as 

data and computing the spatiotemporal gradients from the parametrized, continuous function. 
A close look at these two methods reveals that, in most cases, they are identical and that the 
more complex surface fitting method is only justified in particular cases. A detailed discussion 
of the topic will be offered in section 2.5.2. 

2.5.1 Surface Parameterization 

The problem of finding a local, continuous parameterization of the sampled, noisy irradiance 
function is equivalent to the problem of designing an interpolation filter that performs noise 
reduction during the reconstruction process. There is no optimal way of reconstructing a con- 
tinuous signal from noisy, aliased observations. Instead, we will consider a specific set of N 
orthogonal interpolation functions \P; that simplify the parameter estimation while offering a 
flexible and adequate fit to the irradiance function. The irradiance function u(x, t) is approxi- 
mated by the parametric signal u(x,t) defined by 



N 

u(x.,t) = ^u s (xi,T)*j(x-Xi,<-r) 
»=i 

_ TTTiTV 



(2.38) 
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irradiance samples. The function that is minimized is 

£ («.(x i ,r)-U r !P(x i ,T)) 2 
(xj,t)€Wxxw, 

where W x and Wt respectively represent the spatial and temporal analysis window used in de- 
termining the signal coefficient vector U. In order to obtain robust estimates of the coefficients, 
the analysis window is chosen large enough to overdetermine the fit, that is, to supply more 
observations than necessary to uniquely determine the coefficient vector U. Overdetermined 
fitting essentially performs a low-pass filtering of the irradiance function and reduces the noise. 
The least-squares estimate is easily computed and is given by 

U = ( Y,** T ) ( 5> tt 'te> T ) ) = A-M,. (2.39) 

\X,,T / \X;,T / 

The matrix A = J^ W T is independent of the data and is constant if the current pixel, on 
which the window analysis is centered, is used as the origin. Therefore, the matrix A -1 can be 
computed off-line in advance and only the vector b needs to be computed on the fly. Efficient 
computations of b are possible depending on the choice of the basis functions. 

A versatile and efficient set of basis functions is a set of orthogonal polynomials x t y 3 t t . 
Polynomial curve fitting has been used both by Martinez (1986) and Krause (1987). The 
choice of the specific degrees of the polynomials is a tradeoff between computational complexity 
and accuracy in modeling the irradiance function. A high order spatial fit x*y 3 allows very 
accurate modeling of the spatial variations of the irradiance function; a high order temporal 
model increases the flexibility in tracking the variations of the irradiance from frame to frame. 
Experimental evidence shows that a quadratic spatial fit, that is ti(x) = Uo + u x x + u y y+u xx x + 
u xy xy + u yy y 2 and & = (1 x y x 2 xy y 2 ) for basis functions, is a reasonable compromise. 
A higher dimensional model is not advantageous because, in many cases, the data are noisy 
enough that a higher than quadratic fit produces worse results and a lower order fit (planar) 
is too crude to model the spatial irradiance variations. In the DFU formulation, irradiance 
frames are dynamically unwarped; the unwarped temporal slices, within the analysis window, 
become very similar as the estimates of the displacement are refined. In fact, at the optimum, 
the only temporal variations are caused by the temporal variations of the shading model. For 
this reason, the temporal variations of the model are chosen to be at most linear. Higher 
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than globally quadratic basis functions (tx 2 ,txy,ty 2 ) are discarded because no performance 
improvements are observed in keeping the full 12 basis functions given by («P,iiP). As a result, 
the vector 3* of orthogonal basis functions is chosen to be ¥ = (1 x y t xt yt xy x y ). 

With this choice of basis functions, the most efficient scheme for computing the vector b 
is to use a sliding-block technique and to shift the center of the spatiotemporal window to 
(0,0,0). Components of b that are independent of x can be computed by sliding the window 
along the x-axis. At each step, the next value is obtained by subtracting out the effect of the 
first column at the left edge and adding the new column to the right of the analysis window. 
A similar method is used to compute the components of b independent of y by sliding the 
window along the y-axis and subtracting and adding top and bottom rows. In this fashion, 
only the b xy component needs to be computed with a full summation at each point. Further 
computation savings are achieved by precomputing the components b\, b x , b y , b xy , b xx , b yy in 
each frame and only performing the temporal summation to obtain bt, b xt , 6 y t on the fly after 
the frames are unwarped by the current displacement value. However, in practice, the frames 
are not unwarped but indices in the various frames of coefficients are shifted by the value of 
the displacement. The precomputation and storage of the primary components of b result in 
substantial saving in computation in the incremental DFU method where the gradients need 
to be evaluated many times at different offset locations during the iteration. The gradients are 
computed from the parametrized irradiance function by differentiation of equation 2.38. With 
the chosen basis function the gradients are simply expressed by 



= V x + 2xV x 2 + yV xy + tU xt 



du 
dx 

j^ = V y + 2yVy, + zV*y + tVyt ■ (2.40) 



— = V t + xV xt + yV yt 



du 
Hi 
If the gradients are only required at the original irradiance sampling lattice points, the partial 

derivatives in equation 2.40 are evaluated at the spatial location x = (0,0), 

' du 

— = V x + tV xt 
ox 



du 
dy 



— = U y + tV yt 



at * 
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In order to provide the fastest gradient evaluation for the usual case, i.e. computation of the 
gradient at the central pixel, midway between the two real frames, the parameterization coeffi- 
cient vector U is temporally offset so that t = corresponds to the temporal position midway 
between the two frames; the spatiotemporal gradients are given directly by the coefficients 
(U x , U y , Ut) of the parameterization coefficient vector. 

The next section focuses on the relationship between surface fitting and the stencil method 
of computing derivatives. 

2.5.2 Relationship Between Stencils and Surface Fitting 

At first glance, it appears that the surface fitting method is a much more flexible and accurate 
method than the stencil method since a least-squares fit for each point of the irradiance field 
is computed and the resulting coefficients are used to compute the spatiotemporal gradients. 
However, a closer look reveals that the two methods produce exactly the same results in most 
situations at the expense of a far higher computational cost for the surface fitting technique. In 
many cases, only the gradients at locations falling on the sampling lattice grid and at a fixed 
temporal position are computed. For this case, at most five coefficients of the parameterization 
are used (Ua,, U y , U t , V xt , Uyt); the computation of the gradients is space invariant. Then, 
the computation of the gradients with the surface fitting method and with a fixed stencil method 
are identical. Equation 2.39, which computes the values of the coefficient vector U in terms 
of the basis functions and the sampled data, is in fact an equivalent mathematical expression 
of the convolution of the data with a fixed filter mask determined by the choice of the basis 
functions and the size and shape of the spatiotemporal analysis window. The matrix A - is 
fixed, space invariant and only depends on the basis functions and the size of the window 7 , 
while the vector b both depends on the vector «? and on the data. Vector b can be rewritten 
in the more explicit form 

b = QD 

where Q = (!P(xi,*i) «P(x 2 ,*i) ... ^(x,,,^)), is the matrix of the basis vector & evaluated 
at each arbitrarily numbered point (x;,£i) of the analysis window in the local pixel-centered 
coordinate frame, and D T = (u(xi,ti) u(x 2 ,t 2 ) ... u(x n ,* n )j represents the vector of 

T This statement is only true when A is evaluated in a local coordinate system centered at the current pixel. 
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the values of the irradiance function evaluated at the same points. Appendix C symbolically 
computes the matrix A for the chosen basis functions for a general and symmetric window and 
gives the numerical values of the matrix and its inverse for 3 X 3 and 5x5 spatial windows. 

2.5.2.1 Example of Stencils Generated by Surface Fitting 

In this section, the results of appendix C are used to show what types of stencils are implicitly 
used in usual surface fitting configurations. A~^ denotes the a, (3 element of the inverse of 
the A matrix where the elements are indexed by the basis functions (eg. A~ tyt ) and the 
vector P*'* = (P*'* Pi*' P^'* Pi* P y f PL'* ) T represents the spatial summation of 
the product of the # basis functions with the irradiance function within the spatial analysis 
window in frame i, that is P*j'* = ]T\(x,tj), P x * = ^xufaU) etc. With this notation, 

the vector b is expressed by 

fepi* £«** En* E«t* E*s? Enf E*sf E<nf E' p f )'• 

\ t t t t t t t t t J 

Let us examine the value of the temporal coefficient U t . For a symmetric, square window 
(that is, W x = [-k..0..k] = W y and W t = [-1..0..1]), U t simplifies to 



U« = AS EpH+A^ E«PS* • (2-41) 



For a 2-frame temporal window (t = and t = 1) and an N X JV-point spatial window, 

N 2 



A# = -2A- = -|, Z t Pf = P?* + Pj'*, X t tl>f = Pj'* and equation 2.41 



evaluates to 

u, = £ (p!«* - p?*) 

\W X W X 

i.e. the temporal gradient is computed by the average of the frame difference over the N X N 
window. Similarly, the spatial gradient in the first frame is expressed by 

/ \ 

L 1 wx i^y w x . 

\ Wx Wjr I 
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and represents the N line (row) average of the N/2 centered differences. For example, with a 
3x3x2 analysis window, the corresponding stencil is 



1 1 
3 X 2 
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1 
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1 
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-1 


1 





-1 -1 



1 1 



\ 



J/ 



(2.42) 



2.5.2.2 Advantages of Curve Fitting 

In the previous section, we saw that surface fitting has no advantage over stencils if the only 
task is to compute the spatiotemporal gradients at lattice points; it is a waste of time to 
determine a local parameterization of the surface. The major advantages of the continuous 
parameterization of the irradiance function are the direct evaluation of the function and gradi- 
ents at non-grid points without interpolation and the cheap computation of multiple irradiance 
values and gradients at neighboring locations from a single local parameterization. The latter 
feature is very important and desirable in the DFU method because it provides a fast way of 
evaluating multiple gradients at low cost within the inner incremental parameters computation 
loop. Once the iteration has reached the correct parameters within one pixel, the same local 
parametrization can be used to compute the gradients by equation 2.40 and convergence to the 
optimum parameters is very fast. 

Finally, the next section explores the parallels and differences of surface fitting and direct 
interpolation of the irradiance function and its gradients. 



2.5.3 Approximation versus Interpolation 

The previous discussion highlighted the major advantage of the surface fitting method as op- 
posed to the stencil technique: the determination of a continuous signal model that approxi- 
mates the irradiance function. This property is essential in the DFU algorithm because most 
of the computations occur at neighboring, non-grid points and the continuous, local approxi- 
mation of the irradiance field allows efficient irradiance and irradiance gradient determination. 
An alternate path in obtaining a continuous model of the irradiance function is to use spa- 
tial interpolation. One fundamental difference between interpolation and surface fitting, or 
approximation, is that in the former case the values of the interpolated function and of the 
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original function match at the sample data points (e.g. Lagrange interpolation polynomial). 
One drawback of interpolation is that, in the case of noisy data, it makes little sense to match 
the noisy values and it is preferable to have a global approximation of the data and interpolated 
points. The problem can be alleviated by first smoothing the data with a low-pass filter before 
interpolating. Low-pass filtering is implicit (see section 2.5.1) in the process of surface fitting. 
A more global way of looking at the problem is to realize that both interpolation and surface 
fitting are special cases of the linear filtering problem of noisy, sampled data. 

Let us examine the properties of the linear (in the system sense) bilinear, biquadratic and 
bicubic interpolator that models the irradiance function by a first-, second- and third-degree 
surface. Since these operations are separable, only the 1-D case is discussed. Once the surface 
is modeled, the gradient estimates at arbitrary locations are inferred from the local irradiance 
model and, due to the linearity of the system, the spatial gradients are linear functions of deriva- 
tive of the impulse response of the filter. The three linear (in the system sense) interpolators, 
linear, quadratic and cubic are Lagrange polynomial interpolators. Let 0(x;))i e £> represent the 
values of the discrete function g for the points x; belonging to the domain D. The Lagrange 
interpolator functions gu n {h), g gU a(h) and g cu b(h) are defined by 

9lin{h) = (1- h)g(xk) + hg(x k+1 ) 

g q ua(h) = {H ~ 1} 2 (/l " 2) 9{*k) - h(h - 2)g(x h+1 ) + M£jLH g(a , fc+8 ) 

. ,,, h(h-l)(h-2) . x , (h-l){h+l){h-2) , 
9cub{h) = — i £ '-g{*h) + - 2 ff(*fc+i)- 

h(h-l)(h-2) . , , h(h -l)(h + l) , 

f -9{*k+2) + — £ -gfa+s) 

where h £ [xk,Xk+i]. The interpolation kernels are inferred from these interpolator functions 
and the impulse responses are given by the expressions 



Linear Interpolator 



Quadratic Interpolator < 



u(x) = 1 - |z| for < |a?| < 1 

= outside 

u(x) = -|a:| 2 -(- 1 for < \x\ < - 

= -l\ x \*-l\z\ + l for | < |*| < | 

= outside 
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Cubic Interpolator < 



u(x) = Ux\ 3 - \x\ 2 + hx\ + 1 for < \x\ < 1 

L it 



i, ., , ,, i 

6 6 



3 + |*| 3 -£|*| + l for 1< |*| < 2 



= outside 

and the impulse responses of their derivatives are obtained simply by differentiating the impulse 
response with respect to the variable a;. Figure 2.8 (a) shows the impulse response of the 
linear, quadratic and cubic interpolators. Figure 2.8 (b) displays the impulse response of their 
derivatives. Note that the linear and cubic interpolators have continuous impulse responses 
and that the quadratic impulse response is discontinuous at x = —1.5, —.5, .5 and 1.5, while 
the impulse response derivatives are discontinuous for the linear and cubic interpolator but 
continuous for the quadratic interpolator. Experiments have shown that motion estimation 
algorithms are far more sensitive to gradient discontinuities than irradiance discontinuities. 
Discontinuous gradients can produce instability and divergence of the algorithms. Bilinear and 
bicubic interpolators have consistently produced far poorer results than the biquadratic one. 
Algorithms tend to be unstable with the former interpolators and badly behaved optical flows 
are produced when these interpolators are implemented in optical flow algorithms. 

If the extra accuracy of the bicubic interpolator is required, it is possible to design an 
bicubic C 1 interpolator. Keys (1981) interpolator is a bicubic interpolator that provides both 
a continuous impulse and continuous derivative impulse (see figures 2.9 (a) and 2.9 (b)). In 
fact, Keys' interpolator matches the Taylor series development of the function at the origin 
up to the third derivative. Keys and biquadratic interpolators tend to perform equally well 
for optical flow computation although the optical flow produced by the Keys' interpolator is 
slightly smoother. 

Quadratic surface fitting on raw data and Keys cubic interpolator on low-passed data, have 
comparable computational complexity and give similar results when used in the FICE method. 
However, the quadratic surface fitting scheme displays a clear advantage in the DFU method 
because the fit is computed using multiple unwarped frames that increase the noise smoothing 
and produce a more accurate parameterization of the irradiance function than the one generated 
by the pure intra-frame Keys cubic interpolator. After testing different interpolators, quadratic 
surface fitting was used exclusively in the implementation because it offered the most flexibility 
and very good results with noisy data at a reasonable computational cost. 
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Figure 2.8: (a) Impulse responses of the linear, quadratic and cubic interpolators, (b) derivatives 
of impulse response of the linear, quadratic and cubic interpolators 
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Figure 2.9: (a) Impulse response of the Keys cubic interpolator, (b) derivative of impulse 
response of the Keys cubic interpolator. 
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2.6 Summary 

This chapter derived a differential constraint equation that links the irradiance spatiotemporal 
gradients, the motion and structure parameters and the temporal variations of the shading 
model of the surface. The constraint equation was derived for a general surface, or equivalently 
for an arbitrary depth map and a generic shading model. The surface was then specialized to 
be a polynomial patch in order to characterize it by a small number of parameters that could 
be estimated directly. Several constrained and unconstrained minimizations were presented in 
the general case for polynomial patches and one of the most complete and general models was 
solved to explicitly derive the system of vectorial nonlinear equations that defines the motion 
and structure parameters. The complexity and structure of the nonlinear system was described 
and several potential numerical methods presented and evaluated in terms of their ease of 
implementation, convergence properties and speed. Finally, the computation of the irradiance 
spatiotemporal gradients was discussed in detail and the advantages and drawbacks of several 
methods argued and compared in terms of their behavior in the presence of noise, efficiency in 
computation and flexibility. 



Chapter 3 



Shading Models 



In general, the reflectance function is very complex and cannot be modeled easily. One possible 
approximation for the image irradiance is to consider it as the sum of an ambient (indirect) light 
term and of incident (direct) light term that is composed of the weighted sum of Lambertian 
and specular components. The ambient illumination represents the light incident from the 
environment, i.e. reflections of the direct light from all the materials in the scene, while the 
incident illumination is the light originating from specific light sources with no intermediate 
reflections. In practice, ambient light is not easy to estimate and subtract out but is an irrelevant 
issue when differential techniques to estimate the motion are used. 

The use of specular reflections in conjunction with motion estimation is not easy and might 
not provide useful and reliable information. One of the problems is that the resulting expres- 
sions are very cumbersome and quite useless. Computer graphics has produced very realistic 
models for specular reflections and the more realistic models, like Bui-Tong's (1975) or Blinn- 
Torrance's (Torrance and Sparrow 1967), are very powerful in image synthesis but are rather 
awkward to use in analysis. Another problem is the unreliability of the information: specular 
and glossy reflections vary greatly with surface finish and a smudge or a fingerprint can change 
them a lot. Moreover, unless one is very careful in acquiring the data, sensors can saturate, in 
which case the data represent the sensor properties, not the highlight characteristics. 

If the only goal is to eliminate the undesirable specular reflections, a possible segmentation 
of a patch into Lambertian and specular surfaces can be based on the heuristic that, most of 
the time, the light sources are "white" and a specular reflection corresponds to an image of 
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the source and, therefore, appears to be very bright and achromatic. If we are given a color 
representation of the data (like RGB or XYZ) we can compute the chromaticity components 
(rgb) or (xyz) and segment the latter based on the occurrence of bright, equal chromaticity 
patches. 

In the rest of the thesis, we will assume that we are dealing with a purely Lambertian 
surface or a segmented Lambertian surface where the purely specular and glossy regions have 
been removed. Such a model is a good approximation to real data and allows us to account for 
the variations of irradiance due to shading in the constraint equation. 

3.1 Lambertian Model 

In general, the amount of light captured by a surface patch will depend on its inclination relative 
to the incident beam. As seen from the source, the surface is foreshortened and its projected 
area is its true area multiplied by the cosine of the incident angle i. Thus the irradiance is 
proportional to cosi. In order to have the radiance of the surface patch proportional to cosi, 
we need a surface that reflects all the incident light and appears equally bright from all viewing 
directions. The ideal diffuser, or Lambertian reflector, is such a surface. 

3.1.1 General Lambertian Model 

If we assume that the Lambertian surface has a continuously varying albedo p\ with respect 
to the surface coordinates (a,/3) and a radiance £(R), that can be expressed in the observer 
frame, the shading equation can be written as 

E(r,t) = £(R) = px(a,/3)L cos i = p x (a,(3)L {i ■ n) (3.1) 

with 

- (1-R) , T , „ 

1= irTR[j - d L = '- R 

where 1 represents the source position in the observer frame, R is a point on the surface measured 
in the observer frame, n is the unit normal at the point R and Lq is the intensity of the source. 
With these definitions, L represents the mumination vector from the point on the surface to 
the light source. 



3.1. LAMBERTIAN MODEL 83 

The irradiance equation (3.1) is the product of two distinct terms, a purely surface dependent 
term />>(a,/3) that represents the markings or texture of the surface, and an illumination term 
Zo(L • n). In general, the illumination term depends both on the position and normal of the 
illuminated point on the surface. Thus the spatial shading variations are due to the relative 
position of the point with respect to the source and the local orientation at that point. The 
spatial variations of the irradiance, with a nearby source, are complex to analyze even for simple 
surfaces like planar patches. In most cases, the source will be far enough from the surface (i.e. 
L ~ I) and the shading variations will only depend on the local curvature. In particular, for a 
planar patch, the illumination term is constant for a distant source and the irradiance variations 
are purely due to the texture. 

We saw in Section 2.3, that in the FICE, the shading effects are represented by the temporal 
variations of the irradiance equation. The temporal derivative of the shading can be expressed 
(see appendix D) as 



E =^|p((-(«xR + t).ft) + (L.(«xA)) + ((«xR + t).L)(i.A)) 

= ^£^ %"> a) + a • *)E,«,*i - ((t - (t • m . »)) < 3 - 2 ) 



^U) 



E t 



where the term E^ is due to rotational motion and the term E% is due to translation. The 
notation [L,u>,R] represents the mixed product (determinant) of the three vectors L, u> and 
R. 

The rotational component of the temporal derivative can itself be decomposed in two distinct 
elements, a "distant" term E% = [l,u>, n]/||L|| and a "near" term (L • n)[L,aj,R]. The former 
term can be approximated by [l,ti?,n] if ||R||/||1|| < 1 and captures the contribution to the 
temporal shading variations of a infinitely distant point source, while the latter term represents 
the temporal variations of the shading induced by the nearby spatial variations of shading. 

The translational component only depends on that part of translation orthogonal to L, 
t' = t — (t ■ fi)l. This result could have been predicted directly since a translation does not 
change the orientation of the patch; the incident angle of the rays is the same in the case of 
a translation along the rays and irradiance values of a Lambertian surface only depend on the 
relative angle between the surface normal and light source direction. 
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The expression (3.2) for the temporal shading is, in general, complicated to use in the 

constraint equation. Part of the complexity is due to the presence of the variable R (explicit in 

Zr 
Eua and implicit in ||L||). In fact, expressing R in terms of the image plane variable r, R = — 

makes explicit the dependency of E on the depth, or structure. To remove this dependency, 

special cases of motion, like pure translation or rotation, and/or approximate expressions for 

E need to be considered. 



3.1.2 First-Order Lambertian Model 

In most situations, the relative distance of the light source from the object and object to camera 
is very small i.e. ||R||/||1|| < 1, and E can be approximated (see appendix D.2) by a first-order 
Taylor series with respect to ||R||/||1||, 

E = p X (a,P)L ([i,u,,n] + (I • n)[U,R]H - (t{ • A)p) (3.3) 

where t^ = t — (t • 1)1 and R = f is the unit viewing direction vector. 

Unlike the rotational component E&, the translational term — pLo{yi • n)/||l|| in (3.3) only 
depends on the position of the light source, on the translation and orientation of the patch and 
is independent of the structure of the patch. 

Assuming that the light source is at infinity (||1|| — ► oo), in the direction 1, the previous 
equation simplifies greatly to 

E = p x (a,P)L &<*,h]. (3.4) 

The previous expression (3.4), for a distant point source, can be derived directly from the 
distant Lambertian shading equation, 

E = Px (a,P)L {\-h) (3.5) 

by differentiation with respect to time 

E = p x L (-^ • 1J = p x L ({u> x n) • 1) = p A X [«, M]- 
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3.1.2.1 Lambert ian Model With Hemispherical Source Along 1-Axis 

In this model we assume that the light source is an hemispherical uniform source along the axis 
I. The shading equation, see (Horn and Sjoberg 1979), takes the form 

E(r,t) = £(R) = ip A (a,/3)i (l + (n-1)) 

and the temporal variations of this shading model are expressed as 

E = ±px(a,P)Lo[u>,h,t\. (3.6) 

This hemispherical source model can be generalized to an arbitrary tight distribution. It can be 
shown that an arbitrary light distribution on a unit sphere is equivalent to a single, infinitely 
distant, collimated tight source in a direction 1. The punctual light source direction 1 is denned 
by the unit vector that links the origin of the sphere to the center of mass of the light distribution 
on the unit sphere. 

It is remarkable that equation 3.4, for the general distant Lambertian case, and equation 
3.6, for the hemispherical extended source along the 1-axis, are formally identical within a 
multiplicative constant factor 1/2. However, this formal identity is only superficially true. In 
fact, the irradiance of a Lambertian surface illuminated by a collimated source is expressed by 

E{r,t) = /9 A (a,/3)£ o max(0,(i • ft)), 

with the result that the temporal variations of the irradiance of a Lambertian surface for a 
distant source and a uniform hemispherical source along the 1-axis are only identical, within 
a scale factor, for surface orientation such that (1 • ft) > and differ otherwise. Figure 3.1 
shows the irradiance function as a function of the product (1 • ft) for a distant source and 
an hemispherical source. In the following sections, distant point sources and hemispherical 
extended sources along the 1-axis will not be distinguished although they are not identical. 

3.2 Attenuated Lambertian Model 

The Lambertian model presented in the previous section 3.1.1 only applies to sources and con- 
ditions where the apparent brightness of the source does not significantly change with distance. 
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Figure 3.1: Lambertian reflectance for a collimated distant source and for an hemispherical 
uniform source 

When dealing with artificial light sources the attenuation of the intensity of the light source 
with distance must be taken into account. The power emitted by a light source is dissipated 
isotropically in a spherical volume resulting in an effective attenuation that is a function of the 
inverse of the square of the distance between the light source and the object. This attenuation 
is not an issue for a natural light source like the sun because the sun is so far away that no 
changes in attenuation can be measured on earth. 

In general, additional attenuation is produced by scattering and diffraction but its study 
in different mediums is complex and requires the knowledge of the physical properties of that 
medium. In this section, we consider a case of practical importance where the attenuation needs 
to be taken into account, the case of underwater photography or filming, with artificial lights. 

3.2.1 General Attenuated Model 



In clean, undisturbed water, a Lambertian surface illuminated by a punctual source can be 
modeled by a Lambertian reflectance function where the apparent intensity of the light source 
decreases as the square of the distance between the source and the patch. This model does 
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not take into account the phenomenon of light scattering and diffraction caused by microscopic 
particles in suspension in the water and the variations in transmission and refraction indices 
of the water due to temperature gradients and underwater currents. However, this attenuated 
Lambertian model is a reasonable approximation of underwater vision and the irradiance of the 
Lambertian surface can be expressed as 

£(r,*)=A(L.n) (3.7) 

where R = ||.R|| represents the distance from the camera to the patch. 

It is straightforward to compute the temporal derivative of equation 3.7 using the expression 
of the temporal derivative of the general shading equation. If ^General Lamb represents the 
temporal derivative of the irradiance of a Lambertian patch, as denned by equation (3.2), the 
underwater shading equation can be written as 

E = -jrj -^General Lamb p4~^ ' ft )( R ' *) 

The resulting expression is similar to the simpler expression (3.2) but is of little practical 
value unless reasonable simplifying assumptions can be made. Unlike the regular Lambertian 
case, it cannot be assumed that the light source is distant, since beyond a small zone, all light 
is absorbed by the water and none would reach the object. On the other hand, the problem 
can be simplified by assuming a realistic, particular geometry of light source and camera. 

3.2.2 Light Source, Viewer Approximation 

In a typical underwater set-up, the lights are mounted directly on the camera and, to a first 
approximation, light and camera can be assumed to be coincident. Under this assumption, the 
light source and the camera are located at the origin of the coordinates, L = — R, 1=0, and 
the general equation 3.8 simplifies to 

E =-^((t-ft)-3(ft.t)(ft.ft)) 

=-^K( i *- 3fefti >- 6 ) =- e w{( i >- 3 " t >- & )- <3 ' 9) 



'Although convenient for a single manned camera, much better results are obtain by widely separating the 
camera and light sources. 
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The latter expression is formally equivalent to a Lambertian-like expression where the light 
source vector L is replaced by (t - 3(R- t)R) /R 3 , a space and motion variant attenuated "light 
source". Since the light source and the camera are spatially coincident, the temporal variations 
of the irradiance only depend on the translation of the object and not on the rotation measured 
in the camera frame. 

3.3 Constraint Equation vs FICE 

The validity of the classical constraint equation (CE) has been qualitatively challenged due 
to its unjustified use. Opponents of the CE argue that the brightness constancy assumption 
is physically impossible and there is always some variation of the irradiance as the object or 
camera moves. Proponents argue that the CE is a good approximation of the physical reality 
and use optical flow estimates computed from the CE as evidence. 

Part of the confusion is that although the experimental conditions and the set-up of the 
problem widely vary, the CE is used indiscriminately. At least three factors are important in 
determining the validity of the CE: the physical structure of the problem, the nature of the 
surface and the type of motion. Two different classes of problems are generally considered, 
the passive navigation problem where the environment is fixed relative to the moving observer 
(camera) and the situation where an object is moving in the environment with respect to a fixed 
camera. These two problems are in general not the inverse of each other. They are equivalent in 
a kinematic sense, i.e. the motion of the camera with respect to the environment is the inverse 
of the motion of the environment relative to a fixed camera, but they are not, in general, 
photometric inverses because of the light sources and because of the reflectance properties of 
the objects. In the passive navigation problem, the overall environment moves with respect to 
the camera, but imaged objects are fixed relative to the fight sources. Therefore, there are no 
surface shading variations induced by the relative motion of the object and light sources. 

The second factor, the nature of the surface, can be divided into three distinct components: 
the surface reflectance property, the albedo and the shape of the surface. The surface reflectance 
of the imaged object dictates the amount and type of shading variations that occur when either 
the fight sources move relative to the object, the camera moves relative to the object or both 
happen. We saw in section 2.1.2 that the surface reflectance is described by the bidirectional 
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reflectance distribution function that depends on both the illumination and viewing directions. 
For example, the brightness of a Lambertian surface is independent of the viewing direction, 
therefore no temporal irradiance variations occur in the passive navigation situation and the 
constraint equation is exact. In general, no surface is truly Lambertian but the changes in 
irradiance caused by the variation of the viewing angle are usually negligible. The albedo of 
the surface, or texture, is a major factor in determining the validity of the CE approximation. 
Horn (1988) shows in the passive navigation case, using a sinusoidal grating on the image plane, 
that the changes of brightness of the surface with time are small as long as there is significant 
contrast at high spatial frequencies, i.e. the image spatial gradients due to the texture are 
dominant in the CE with respect to the temporal changes in shading (texture driven CE). 
The approximation becomes worse as the surface markings become weaker. At the limit, for a 
textureless surface, all variations are due to the shading (shading driven CE) and, by design, 
the classical CE cannot deal with these cases. 

Surface shape is another important element in determining the validity of the CE approxi- 
mation. The approximation is the best for a planar surface and is increasingly worse for highly 
curved surfaces. Shading and its variations depends on the normal of the patch; as the normal 
varies rapidly, the spatial and temporal variations of shading become larger. 

The third factor, the type of motion, is also a key element in the validity of the CE in 
the case of an object moving relative to the light sources. Intuitively, rotational changes are 
expected to be substantial because rotation causes a change in the surface patch orientation and 
therefore, a modification of the irradiance. On the other hand, few variations are anticipated 
in the translation case. This intuitive analysis is consistent with the expressions obtained for 
the temporal variations of the shading (equation 3.2 in the general case). In fact, looking at 
the first-order expansion (3.3) of the general equation (3.2), shows that the temporal variations 
of shading depends on the rotational motion u> with a zero-order term [l,u>,R], but on the 
translational motion t with a first-order term (t x • A)—. 

The analysis of the effects of the temporal variations of the shading can be carried out 
analytically to some extent and experimentally in a broader sense. The next section focuses 
on the analytical analysis of some configurations of texture and shading models in order to 
evaluate the importance of the various terms in the FICE. 
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3.3.1 Analytical Analysis of the FICE 

The goal of the analysis is to separately compute the three components of the FICE and evaluate 
their relative magnitude as a function of the texture and the motion. The three constituents 

of the FICE are (V r E • f ), the product of the spatial gradient with the optical flow, E t , the 

dE 
interframe change of the irradiance and — — , the rate of change of the irradiance. In order to 

at 

compute the various quantities in terms of texture, the texture of the object on the image plane 
needs to be tracked to express the image coordinates of the projected object in terms of the 
original surface coordinates of the texture on the object. 

In general, E t is the most complex quantity to evaluate analytically because it relates the 
irradiance of two separate surface points that have different albedos and surface normals. V T E 
is complicated to compute for a general surface since the variations of irradiance are due to 
the variations of both surface normals and texture. However, for planar surfaces, the normal is 
constant and the spatial variations are only due to the spatial variations of the albedo, unless 
the source is close to the surface and each point of the surface sees a light source at a different 
spatial location. E relates the irradiance of two image points that represent the same point 
on the object and, therefore, have the same albedo but a different unit normal and is easily 
computed. 

In order to compute the different quantities of the FICE in terms of the texture on the 
surface, we need to express the geometric relationship between the surface coordinates U = 
(u,v,w) of the object, in which the texture is fixed, and the world coordinates X or image 
coordinates x. For clarity, the equations are shown for the planar case only. The formulation 
for a general surface is similar, although the mapping between the surface and world coordinates 
can be much more complex. Let us consider the rotation matrix P and the translation Xo such 
that 

X = PU + X 

or, equivalently, 

U = P T (X - X ). (3.10) 

It is always possible to find a nonunique matrix P and vector Xo that maps the arbitrary plane 
in space to a frontal plane at the origin of the world coordinates, i.e. the 2-D surface coordinates 
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u r = (u, v) coincide with the 3-D world coordinates X r = (X, Y, 0) and with the 2-D image 
coordinates x r = (x,y) at the origin. Equation 3.10 enables us to directly compute the albedo, 
given by a function of the surface coordinates, at an arbitrary point X of the object. In most 
cases, it is more convenient to directly consider the relationship between the image and surface 
coordinates, since we observe the irradiance function 2?(x, t) which is expressed as a function 
of the image coordinates. If n represents the unit normal of the planar surface, we have the 
dual relations (assuming a unity focal distance) 

r= ( p P .^) +z . (pu+x ° ) 
u = pT (t^' - Xo ) • (3 - n) 

The irradiance spatial gradients are computed from the albedo gradients by means of the 
Jacobian J ux of the transformation from x to u. For a planar surface denned by the normal n 
and illuminated by a infinitely distant light source in the direction 1, E(x.,t) = L (l • ii)/>(u). 
The spatial irradiance gradients are expressed by 

V x £ = L {\ • n)J ux V u ? 

= Lo(\ • A)^^ (((x • n)I 2 - nx r )p + (P - np r )) V u p, 

where P = I2,3PI3,2 is a restriction of the rotation matrix P, p r = (P3i?P32) is the vector 

( Y * ^ 1 

formed by the first two components of the last column of the matrix P and — jt- = — . 

Using the expression of optical flow (equation 2.9) rewritten in the matrix form 

r = Au> + -T 
Zi 



where 



-xy 1 + x 2 -y 
A = and T = 




-(1 + y 2 ) xy x 

the term (V X E • f ) can be computed in term of the motion parameters, the image coordinates 
and the analytical expression of the albedo spatial gradients 

(V X E • f ) = Xo(I • A)^^(u, r A + iT T ) (((x • n)I 2 - nx r )p + (P - np r )) V u p. (3.12) 
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In the case of a distant light source, the temporal variations of the shading are given by 
(3.4), and can be rewritten in the form 

E(x,t) = L p(u)$,u,,R] (3.13) 

with u is expressed in terms of the image coordinates by means of (3.11). 

Equations 3.12 and 3.13 allow the computation of the two terms E and (V X E • f ) as a 
function of the motion and texture. The ratio 6 of E and (V x E-r) that can be computed using 
(3.12) and (3.13) when the denominator is nonzero is a meaningful quantity in judging the 
accuracy of the CE approximation. This criterion is used in the examples of the next section 
to display the accuracy of the constraint equation. 

It is difficult to draw general conclusions from the expressions (3.12) and (3.13) because 
their analytic form is fairly complex in the general case. If the simplified case of a sinusoidal 
grating p(u) = (l + sin(f • u)) , where f T = (/, g) represents the vector of spatial frequencies of 
the grating on a frontal plane (i.e. P = I3) is used, the previous equations take the simplified 
form 

£ = £o[i,«,R](l + sin(f.x)) (3.14) 

and 

(V X E • r) = Lo{fi + gy) cos(f • x). (3.15) 

Equation 3.15 suggests that the term (V X E • f ) of (3.14) can be very large compared to the 
term E due to the multiplicative factor (fx + gy), provided the motion is not parallel to the 
ridge of the grating, i.e. (fx + gy) — 0. However, for a given spatial frequency f, E may not 
be negligible with respect to the other term, since E depends on the rotational velocity u> and 
both terms are multiplied by the texture contrast Lq. This example demonstrates that when 
the object moves with respect to the light sources, Horn's conclusions (Horn and Weldon 1988) 
cannot be inferred even in the same simple case. 

The previous discussion shows that, even in very simplified situations, it is not obvious that 
the temporal shading variations are negligible in the constraint equation. In practice, a texture 
contains many different spatial frequencies and the accuracy of the CE approximation can only 
be judged by the tedious process of numerically evaluating the different terms of the FICE 
and comparing their magnitudes, or by performing the parameter estimation with the CE and 
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Figure 


Rotation o> in degrees 


Translation t in pixels 


a-b 


(.5,-.4,.6) 


(.64,.4,-.32) 


c-d 


(.5,-.4,.6) 


(.064,.04,-.032) 


e-f 


(.05,-.04,.06) 


(.064,.04,-.032) 



Table 3.1: rf-plots motion parameters. 

FICE on known surfaces with a known motion and by comparing the accuracy of the motion 
and structure estimates. The next section provides a few examples that qualitatively show the 
goodness of the CE approximation for different textures and amounts of motion. 



3.3.2 Qualitative Assessment of the CE Approximation 

The examples presented in this section use the ratio S as a measure of accuracy of the CE 
approximation. Specifically, an 8-bit grey level, thresholded plot of the ratio S, expressed 
as a percentage, is used. The overall image is shifted by 128 to represent signed quantities. 
Consequently, neutral grey represents 0% ratio, i.e. E — 0. Once a threshold T is chosen, all 
values of 6 such that 6 > T, are mapped into black (0), and similarly values of S such that 
6 < -T are mapped into white (255). All other values 6 G [-T, T] are uniformly mapped by a 
grey scale ramp. Two thresholds, 5% and 10%, are used in the plots. 

The first example shows the influence of rigid motion, rotation and translation, on the CE 
for a multiplicative sinusoidal grating. Figure 3.2(a) displays the irradiance of a plane rotated 
10 around the x-axis, 15 around the y-axis and 30 around the z-axis, and mapped with 
the texture p(u) = 128(l + cos(7ru)cos(xt;)), with u,v £ [—1,1]. Figure 3.2(b-d) represents the 
temporal gradient, x-gradient and y-gradient respectively for a rigid motion of the plane given 
by the parameters u? r = (.5, —.4, .6), expressed in degrees of rotation around the elementary 
axis, and t r = (.64, .4, -.32) expressed in image pixels. Figures 3.3(a-f) represent the £-plots 
with a threshold of 10% for the images on the left and 5% for the images on the right. Table 3.1 
summarizes the values of the motion parameters for the various images of figure 3.3(a-f ). 

The results in the figures 3.3(a-f) confirm the intuition and the analytical results of the 
previous section. Translation has a very small effect on the CE accuracy while rotation has 
a major impact. Only a slight difference is visible between the images 3.3(a-b) and 3.3(c-d) 
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Figure 3.3: 10% and 5% <S-plots for three sets of motion parameters, given by table 3.1, with 
the cosine grating. The plots on the left ((a) top, (c) middle, (e) bottom) are the 10% plots, 
the plots on the right ((b) top, (d) middle, (f) bottom) the 5% plots. 
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although the translation has been decreased by an order of magnitude. On the other hand, 
a dramatic change is observed when the rotation is decreased by an order of magnitude from 
figure 3.3(c-d) to figure 3.3(e-f). 

The second and third example shows the influence of texture on the accuracy of the CE. 
The example of figure 3.4 uses a cosine grating of multiplicative ramps, defined by p(u) = 
128(1 + cos(tt 2 uv)), with (u,v) G [-1,1], as a texture mapped onto a plane identical to the 
one used in figure 3.2. The example of figure 3.5 uses an exponentially damped cosine defined 
by p(u) = 128(1 + e" 1,5|u| cos(27r|u|)), with (u,v) G [-1,1] and is mapped on the same planar 
surface as before. The motion parameters of the moving plane in these two cases are the same 
as the one used in figure 3.3(a-b), namely u> T = (.5, -.4, .6) and t T = (.64,-4, -.32). These 
examples demonstrate that for some types of texture the CE is a fairly bad approximation and 
the FICE should be used instead. 

3.4 Summary 

In this chapter, several Lambertian models were presented and their temporal variations com- 
puted and examined. The general Lambertian model was considered first. This model captures 
both the shading effects of an infinitively distant source and the spatiotemporal variations in- 
duced by a nearby source and is therefore fairly complex especially for high order surfaces. Two 
simplifications of this general model were derived, a first-order expansion in terms of the ratio 
of the distance between the object and the camera and the distance between the light source 
and the camera, and the case of an infinitely distant source. The former simplified model is 
helpful for sources that are neither distant nor immediately next to the surface, while the latter 
model represents the usual situation of natural light illuminating objects on earth and was 
shown to be very similar, but not identical, to the case of extended light sources. An atten- 
uated Lambertian model was proposed and simplifying assumptions introduced in the case of 
underwater photography. Finally, an analytical analysis of the FICE was done and the validity 
of the CE examined. More specifically, the importance of shading and the influence of the type 
of motion and texture was evaluated in order to assess the validity of the CE approximation. 
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Figure 3.4: Multiplicative cosine grating on slanted plane, (a) (top left) is the irradiance image, 
(b) (top right) the temporal gradient, (c-d) (middle left and right) the x and y gradients, (e) 
(bottom left) the 10% <5-plot and (f) the 5% <5-plot. 
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Figure 3.5: Exponentially damped cosine on slanted plane, (a) (top left) is the irradiance image, 
(b) (top right) the temporal gradient, (c-d) (middle left and right) the x and y gradients, (e) 
(bottom left) the 10% <5-plot and (f) the 5% <5-plot. 



Chapter 4 



Planar Patch Estimation 



This chapter discusses the specific problems and forms of the general minimization equations, 
developed in chapter 2, when applied to the planar patch case. Specific implementations with 
various shading models are derived and results of the algorithms on real and synthetic data 
presented. 

A planar patch is the easiest surface structure that can be considered but is still a fairly 
complex case to study in the context of the FICE for a general Lambertian shading model. 
The planar case is well studied and understood; Negahdaripour and Horn (1987) presented a 
closed-form solution in the general motion case. Their solution amounts to solving an eigenvalue 
problem in the case of the traditional constraint equation. However, in the FICE formulation, 
no closed-form solution exists; the parameter estimates are determined by a system of nonlinear 
vectorial equations that vary in complexity depending on the choice of the shading model and 
albedo function. Once the vectorial system is determined, a scalar system is obtained by 
projection of the vectorial equations onto various axes. Although, in theory, the solution of the 
original vectorial system does not depend on the choice of the projection axes, the algebraic 
complexity and numerical behavior of the resulting scalar system vary greatly; a proper choice 
of the projection axes is required to obtain accurate parameter estimates. 

Several different implementations are derived for various shading and surface models; ex- 
amples of parameter estimation are presented in the next sections. 

99 
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4.1 Implementations of the Planar Patch Case 

A planar patch is exactly represented by equation (2.23), — = „„ , in the viewer frame, that 

Z FZq 

is, the reciprocal of the depth is expressed as a linear functional in the image coordinates r. In 

(r • n) 
the FICE, the structure term, , only appears in the product with the term (s-t), resulting 

FZq 

in a scale ambiguity between n and t. In fact, the same solution to the problem is obtained if 
n is replaced by kn and t by t/k, therefore, the translation t can only be recovered within a 
global scale factor, or in other words, only its direction can be determined. As a consequence, 
the explicit scale factor l/FZ can be omitted from the equations without loss of generality. 
However it is required in the practical implementation with real data, where the focal distance 
of the camera F and the absolute distance Zq are known, to directly compare the estimated 
parameters with the known experimental parameters of the set-up. This scale ambiguity can 
be turned into an advantage. 

Very powerful simplifications can be obtained in cases where the temporal shading variations 
only depend on the unit normal n as opposed to both n and n. This distinction comes about 
because the shading equation usually only depends on the direction of the normal, i.e. n, while 
the reciprocal of the depth is expressed in terms of the full normal n, and both the direction 
and the magnitude of the normal are relevant information. Therefore, the temporal derivative 
of the shading expression can depend on n and n concurrently when the distance of the patch 
to the source or camera is involved (like the nearby source case). The normal n can be replaced 
by the unit normal fi in the term (r • n), that arises from the expression of the reciprocal of 
the depth in terms of the surface coordinates, producing yet another scaling of the translation 
vector t. This substitution results in a FICE that only depends on the unit normal, and the 
vectorial equation (2.28), obtained by differentiation of the minimization equation (2.25) in 
section 2.3.1.3, simplifies to 

Jl (( _(S ' ^fn" - PX l£) ^ (u '' t ' n) - '* e 0) - "W^Js) dt + An = 0. 

In order to fully specify the planar patch FICE, a specific shading model is required. Only 
Lambertian surface reflectances are considered in this thesis, as previously mentioned. The 
Lambertian shading model is determined by the type (punctual or extended) and position 
(nearby or distant) of the light source(s) and by the characteristics of the albedo function. 
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In this study, the albedo p\(a,f5) is either constant, the surface is textureless and all the 
spatiotemporal variations of the irradiance function are due to shading, or continuously varying 
and the spatiotemporal variations are due to both shading and texture. More specifically, the 
spatiotemporal gradients of a textured surface have a component caused by the surface markings 
(texture) and a component that is due to the spatiotemporal variations of the shading on the 
surface. 



4.1.1 Distant Punctual Source Illuminating a Lambertian Patch 

The simplest Lambertian reflectance model that can be considered is the one with a distant 
punctual source and a constant albedo p\. It is easily seen that the irradiance (equation 3.5) 
is spatially constant and that the temporal rate of change of the irradiance (equation 3.4) is 
constant. Since the irradiance is spatially constant, the spatial gradient field E T is zero and so 
are the vector fields v and s. It is clear, without further equations, that the problem is grossly 
underconstrained and cannot be solved even by considering multiple frames. 

Assuming a set of discrete constant patches leads to a similar problem. The problem is 
underconstrained within each patch, and the previous equations are not valid at the patch 
boundaries. 

In order to obtain more information, in the distant source case, a Lambertian surface with 
a continuously varying albedo is required. The source strength Lq, which cannot be estimated, 
is set to unity and the albedo p\(a,/3) is denoted by p(r) to emphasize the fact that it is, 
indirectly, a function of the image coordinates. It should be noted that the notation p(r) is 
deceptive because the albedo depends on the surface coordinates (a,/3) and not on the image 
coordinates r. However, (a, fi) can be expressed (see section 3.3.1) in terms of the image 
coordinates r and vice-versa. This issue is not relevant in the two-frame case because texture 
does not need to be tracked there. The two-frame case can be exclusively specified by the 
spatiotemporal gradients of the irradiance. In practice, a lot of synthetic data used in the 
two-frame situation are produced by directly generating the spatial gradients and computing 
the temporal gradients directly from them by means of the CE. However, this approach failed 
in the case of multiple (more than two) frames because texture needs to be tracked in this case. 
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4.1.1.1 Solution Using the FICE 

This section deals with the case of a planar Lambertian surface with a smoothly varying albedo 
and illuminated by a single punctual source of known direction with straight FICE used as a 
constraint equation. This case is solved by a minimization problem of type (b') (equation 2.24) 
with a unique parameter constraint on the unit normal n. This minimization is identical to the 
generic minimization of section 2.3.1.3, where the unit normal n is substituted for the normal n 
because the temporal variations of the shading model only depend on n. The generic functions 
£(n), e(L, n) and e t (L, n,u>,t) are specialized to 

C(n) = (p.n) 
e(L,n) = />(r)(l.n) 
e t (L,n,u?,t) = />(r)[l,u>,n]. 

If we let F(u>, t, h) — E t — (v • u>) - (r • n)(s • t) - />(r)[l, u>, n], the unconstrained minimization 
equation 2.25 can be written in the form 

min (^(f 2 + M (r)(F(r,0 - p{v){\ • h)))dr + A(||n|| 2 - l)) (4.1) 

and the resulting vectorial system, obtained by differentiation of (4.1) with respect to the 
parameters t, u> and n, takes the form 



// 



//( 



F(-v-p(r){hxl))dr = (4.2) 

F(-s(r-h))dr = (4.3) 

F(-(s • t)r - />(r)(o? X 1)) - n(r)p(r)tjdr + An = 0. (4.4) 



II 



The Lagrange multiplier A and the Lagrange multiplier function /x(r) can be eliminated by 
taking the dot product of equation (4.4) with the vectors n and 1 and by solving the resulting 

scalar linear system in the unknowns A and // /i(r)p(r)dr. If we let Yf = —z — — and 

JJa ||1 x n|| 2 

n^ = ^— : , the Lagrange multiphers are given by 



II xn 



ft n{T)p{v)dr = - II F(l* • ((. • t)r - p(r)(a» X \)))dt 
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\ = -JJf(u 1 ± . ((s . t)r - ,(r)(o, X l)))dr 

and can be eliminated from (4.4) to yield (if (1 • n) ^ 0) 

II F(-(s ■ t)(I 3 - l?F)r + />(r)(I 3 - nWHu, X \))dr, (4.5) 

where the albedo />(r) is expressed in term of the irradiance data by means of the shading 
equation, p(r) = E(r,t)/(L- n). 

Equations (4.2), (4.3) and (4.5) form a system S of three nonlinear vectorial equations and 
can be solved by any of the methods described in section 2.3.2, once it has been projected 
onto suitable axes. An alternate, more efficient, way of solving S is to notice that the first 
two equations (4.2) and (4.3) are linear in u> and t; that is, the system S is in fact semilinear 
and can be broken into a linear system C formed by the first two equations and a nonlinear 
equation M (4.5) in n. This semilinear system can be solved iteratively: C is solved using 
the current estimates of n, then J\f is solved using the estimates of u> and t. The iteration is 
initially started with an arbitrary value for n. The advantages of this method, as opposed to 
a general global nonlinear method, are the speed, the improved convergence performance and 
its elegance. The increase of speed is partly due to the fact that two of the vectorial variables 
are solved by a linear system and partly due to the fact that we are dealing with a nonlinear 
equation as opposed to a system of nonlinear equations. The improvement in convergence is 
caused by the lower dimensionality of the nonlinear part of the system. Experiments suggest 
that the convergence of the higher dimensional nonlinear system is far worse than the semilinear 
method. Unfortunately, the convergence of the semilinear iterative implementation is difficult 
to prove or support, even intuitively. The elegance results from the ability to directly solve the 
vectorial linear system without need of projecting it onto a specific axis. C can be written in 
the matrix form 



(4.6) 
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where 

Ml = ff [{-* + &)(** *))(* + **)(** fy*]** 

= jf [(w r ) + p(r)((v(n X If) + ((n X l)v r )) + p(r) 2 ((n X l)(n X if)] dr 

M 2 = jy(r.n)((v + p(r)(nxl))s r )dr 

M 4 = //(r.n) 2 (ss r )dr 

ei = [J E t (v + p(r){hx\))dr 

e 2 = // E t (r- n)sdr 

and the solution to the system (4.6) is given by 

t = (M 4 -M^Mi- 1 M 2 )- 1 (M^M^ 1 e 1 -e 2 ) 
u> = -Mi ^ei + M 2 t) 

In order to numerically implement the nonlinear vectorial equation (4.5), it needs to be 
broken down into scalar components. It is possible that the eigenvectors of the two matrices 
M.T = (lji r -I 3 f and M^ = (nj L n r -I 3 f are advantageous projection axes. The eigenvectors 
of the matrix M?" are 1, associated with the single eigenvalue 0, and any two linearly independent 
vectors orthogonal to 1*, associated with the double eigenvalue 1. The eigenvectors of the 
matrix M^ are n, associated with the single eigenvalue 0, and any two linearly independent 
vectors orthogonal to n^ associated with the double eigenvalue 1. The eigenvectors n and 1 are 
unsuitable because the dot product of n with the nonlinear equation (4.5) produces a scalar 
equation linearly dependent on the equation (4.3) of the system S, while the dot product with 
1 produces a null equation. However, the use of the common eigenvector (1 X n) of M. and 
MT results in a simple scalar equation. In our numerical implementation, the two other scalar 
equations were obtained by projection onto the basis x and y-axis. 

At first glance, it might appear that solving the different equations and the linear system is 
very costly, since the integrations over the whole image J need to be performed at each iteration. 
Appendix F shows that most of the components of the vectors and matrices in the previous 
equations can be precomputed resulting in a very efficient numerical implementation. 
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4.1.1.2 Solution Using the Dynamical Frame Unwarping FICE 

The previous section assumed that the simple FICE was used in the minimization procedure. 
In practice however, the DFU incremental FICE is used, and the previous equations need to 
be slightly altered. More specifically, the temporal irradiance gradients are replaced by the 
displaced frame difference, D E (T,t,T,u>( n \tW), denned in section 2.3.3, the vector fields v and 
s now depend on the current estimate of the motion and, finally, the motion parameters are 
replaced by the incremental motion parameters. Under this framework, the linear system C 
(4.6) becomes 



(4.7) 



where 



Mi(w< n >,t< n )) M 2 (u?W,t( n )) \ / »-«(") \ _ / di(»( n ),t< w >) 
Mf(w("),t(")) M 4 («H,t(")) J \ t-tW / \ d 2 (t»,t(")) 

d x (c»,tW) = ff D E (r,t,T,^ n \t^){v(^ n \t^) + p(r)(hx\))dr 
d 3 («< n >,t< n >) = //^(r,t,T,u.W ) tW)(r.n)s(u,( n ),t( n )) ( fr, 

and the system (4.7) is solved for u>( n+1 ) and t^ n+1 \ the (n + l) th estimates of the translation 
t and rotation w. 

Similar adjustments of variables and parameters are performed on the nonlinear equation 
(4.5) or on the overall system S when considered globally as a system of nonlinear equations. 
In practice, the straight FICE implementation is identical to the DFU implementation, where 
the extra iterative loop, which computes the incremental motion parameter, is disabled and the 
field unwarping step suppressed, resulting in motion independent vector fields v and s. 

4.1.2 General Punctual Light Source Illuminating a Lambertian Patch 

The previous section dealt with the case where there were no shading variations induced by the 
proximity of the light source. The advantage of the previous case is its simplicity; its drawback 
is its inability to deal with textureless surfaces, because in the distant case, the irradiance and 
the temporal variations of the irradiance on the surface are constant. This section examines 
the use of the general Lambertian shading model (3.1), presented in section 3.1, in the FICE. 
The complexity of the temporal variations of the shading model (equation (3.2) in the general 
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case and equation (3.3) in the first-order Lambertian model) is far greater than in the distant 

Z p 

case and is mostly due to the presence of the structure term R = -=7 r, which depends on 

b (r • n) 

dE 
n, in -7-. Due to its complexity, the full model is more appropriate in the case of textureless 
at 

surfaces (p constant) and/or in the case of special motions like pure translation or rotation, 
than in the general motion and texture case. 

In order to slightly simplify the presentation of the general case, the equations shown in 
this section are derived for the textureless surface case. In this instance, a minimization with 
a single constraint on the parameter n is performed and the vectorial equations only contain a 
single Lagrange multiplier A associated with the parameter constraint ||n|| 2 = 1. The general 
texture case would be treated in a fashion similar to the distant case of section 4.1.1, and the 
Lagrange multipliers eliminated in an identical way. 

4.1.2.1 General Model 

Let F(u>,t, n) = E t - (v -u>) - (r • n)(s • t) - E ga i where E ga i represents the temporal variations 
of the general Lambertian case, i.e. 

E ga l = ||£i| ([l,«,ft] + (L • n)[L,a,,R] - (t£ • n)). 

The unconstrained minimization equation can be written in the form 

E = nun ( II F 2 dr + A(||fi|| 2 - l) V 

and the resulting vectorial equations take the form 

/^(-v-||^((ixn) + (L.n)(LxR)))dr = (4.8) 

|jfF(-s(r.n)-^(I 3 -LL r )n)dr = (4.9) 

^F(-(s.t)r + ^)<fr + An = (4.10) 



where 



(f) TL + M( (lxw) + t[t ' u, ' R] + (t ' fl) (t) r(txw) - ti ) 



dE M * L = _E_ (dK\ T T ^_A_{,^ ,.., ^ f rf . ,.. wl ^ „. . A ^ (8*\ T r 

dh HLI 12 
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,,9R -1 TlVT j^ fdn\ T 1 /_ zn T \ 

with -^r = -. rzTT 1 N A and N ft = — = -^^ I 3 - ,. M • 

dn (r-n) 2 \9nJ n-z\ (n-z)/ 

It is noteworthy that the above system is also semilinear in t and u> and can be solved 

with a method similar to the one described in section 4.1.1 and the linear portion of the 

system is only sightly more complicated than its distant counterpart. On the other hand, 

the nonlinear equation is far more difficult to implement numerically in the general case. More 

robust numerical implementations are achieved in special motion cases, as in the pure translation 

case (u> = 0), where the nonlinear equation (4.10) simplifies to 



JJ/h^-mh^w^r^*-- - (4 - u) 



The general case described in this section is only relevant when the light source is very close 
to the surface relative to distance of the plane to the lens. Very often the light source is further 
away, and it can be assumed that ||R||/||1|| < 1. The relative proximity of the light source still 
induces a variable shading across the plane surface but the overall implementation is simpler 
as we will see in the next section. 

4.1.2.2 First-Order Model 

Under the assumption that the relative distance from light source to object and from object 
to camera is small, the shading equation and the temporal variations of shading equation can 
be approximated by a first-order Taylor series with respect to -r—rr- (see section 3.1.2). For a 
planar patch, the shading equation takes the form 

^• i ) = "(( 1 - a »-?FJj(( r -('- , '"- ft )w)' 

i.e. the distant uniform shading is modulated by a first-order, in -rprjj-, shading variation across 
the planar surface. The temporal variations Efi rst are computed by equation (3.3) and can be 
rewritten, in the planar patch case, in the form 
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If we let F = E t - (v • <*>) - (r • n)(s • t) - Efi rtt , the resulting vectorial equations, which define 
the parameters t, u> and n, are, for a textureless surface, 

//'(-'( , - + f^^))*-' 

///H- ft »-^-" lF 'ii)* = 



where 



dE first ( (Xv . , IHIlU^ ft ||r|| rn r y t*. \ 



|r-n| V J ||1|| n (r-n), 
These equations are numerically more stable than the equivalent general shading equations 
of the previous section. Moreover, they represent a reasonable approximation of the general 
equations for ratio —^- up to about 10% and should be preferred if the speed required for 
solving the equations is more important than the accuracy of the solution. 

4.1.3 Attenuated Lambertian Model 

The last model presented is the attenuated Lambertian model under the assumption that the 
light source and the viewer are coincident. This model was developed in section 3.2 for the case 
of underwater photography with the light source mounted on the camera. It is described by 
the shading equation 

^ r '*) ~UTU*~( ' n ' ~ " llwll 2 ( ' '' 
\\ L \\ ll K ll 

its temporal variations of shading, expressed by equation (3.9) can be rewritten in the form 

If we let F = E t — (v >u>) — (r- h)(s-t) — E at t, the system of vectorial equations in the parameters 
t, u> and n, for a textureless surface, is simply given by 



//*(-*.*,- 



// Fvdr = 


(4.12) 


Z 2 (r.n) 3 ut T )h \ dr = 
H F 2 ||r|| 3 l ' / 


(4.13) 


^-] (I 3 - 3rf r )t) de + An = 0. 


(4.14) 
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4.1.4 Conclusions on the Planar Implementations 

The previous sections presented several implementations of the rigid body motion and structure 
recovery problem with shading in the planar patch case. The vectorial systems of equations 
defining the motion parameters t, u> and the structure parameter n were derived for several 
shading models, different types of albedo functions and for the regular and dynamically unwarp- 
ing frame FICE. In most instances, the equations were given in the simpler case of a textureless 
surface and used the FICE as a constraint equation. Section 4.1.1 showed how to derive the 
relevant equations when the shading equation is used as a constraint in the general texture 
case. The method used in the distant case is general, and only the form and complexity of 
the Lagrange multipliers change in the other cases. Section 4.1.1.2 explained how to modify 
the equations derived with the FICE, in order to obtain the equations under the DFU case. 
The modifications amount to introducing an extra iterative loop for the computation of the 
incremental motion parameters and to updating the now variable vector fields s and v. No 
attempt was made to present the scalar equations that were actually numerically implemented 
because this operation is very tedious and no additional information is gained in displaying the 
scalar equations. 

The next section presents a set of examples that illustrates the performance of the proposed 
algorithms, compares the results to those given by a classical implementation, when appropriate, 
and explains the numerical difficulties present in the implementations. 



4.2 Examples 

The examples presented in this section attempt to demonstrate the performance of the algo- 
rithms in various situations with both synthetic and real data. Synthetic data are used in most 
examples because, in many cases, the experiments could not be simulated, easily and reliably, 
with real data, and synthetic data provide a way of evaluating the accuracy of the estimates by 
comparing them to the true values. Synthetic data were produced by ray-tracing an analytical 
texture function on a moving plane. This method allows perfect control of all the lighting, 
structure and motion parameters and produces high quality images at various resolutions. The 
irradiance data were quantized to eight bits and only the quantized data were used in most of the 
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experiments where the algorithm estimates the spatiotemporal gradients from these irradiance 
data. 

Real data acquisition is very delicate and it was extremely hard, given the equipment and 
experimental set-up available, to control lighting and to measure accurately the motion param- 
eters and camera parameters such as focal distance. Nevertheless, real data were used in the 
experiments because they were the driving force behind the development of the multiple-frame 
DFU algorithms which allow the estimation of the rigid body motion and the structure directly 
from the irradiance data obtained from a video camera. However, the structure and motion 
estimates that are obtained are difficult to interpret, as far as accuracy is concerned, because 
the real parameters are only approximatively known. 

Real data are used in the distant Lambertian case, while synthetic data are produced in the 
distant, nearby and attenuated Lambertian case for a general albedo function and textureless 
surfaces. The use of synthetic data in the nearby punctual source and the attenuated cases is 
a necessity because no real, meaningful data could be obtained in these instances. Most of the 
examples presented in this section use the DFU formulation of the FICE, and multiple frames 
are used in the real data cases. In this section, the only goal of the experiments with real 
data is to obtain the best estimates possible without regard to the specifics of the multiframe 
algorithm. The discussion of the performance of the algorithms, as a function of the number of 
frames used, is delayed until chapter 6. 

The examples shown in this section demonstrate the superiority of the FICE with respect 
to the CE in two ways: more accurate results are obtained in the case of textured surfaces 
and the FICE formulation is able to solve the case of textureless surface, where all the spa- 
tiotemporal variations are due to the shading, unlike the CE formulation that cannot deal with 
weakly marked or textureless surfaces. However, these new, improved results have a higher 
computational cost than those obtained by the less accurate and general CE formulation. 

4.2.1 Distant Source 

The distant, or hemispherical, source 1 illuminating a Lambertian surface case represents the 
most basic and fundamental situation. Under the assumption that the surface being imaged 



'Distant and hemispherical sources are not equivalent but have similar behaviors as it was explained in 
section 3.2. 
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has a reflectance that is a good approximation of a Lambertian reflectance function, this case 
corresponds to the generic video recording situation. As such, the images can be processed, to 
compare the performances, with the algorithms described in this thesis and with conventional 
algorithms that use the classical constraint equation. 

The closest, comparable classical algorithm is the direct motion algorithm for planar surfaces 
of Negahdaripour and Horn (1987). A direct, fair comparison with the experiments published in 
the cited paper is unfortunately impossible for two reasons: they did not specify the parameters 
of the multiplicative sinusoidal patterns used in the experiments and the image gradients are 
computed analytically assuming that the constraint equation is satisfied. Under these condi- 
tions, the motion and structure parameters are recovered perfectly and their algorithm displays 
perfect performance. Unfortunately, these data are unusable in the general case, since they 
assume, a priori, that the CE holds and discard, by construction, all shading variations in- 
formation. No experimental results are provided in the situations where the spatiotemporal 
gradients are estimated directly from an irradiance pair of images (synthetic or real), and the 
motion and structure parameters are computed from these gradient fields. Since no method is 
provided to compute the required gradients from the irradiance images, the FICE implemen- 
tation is used in both the CE and the FICE cases when comparing the performance of the two 
algorithms in the presence of shading induced variations. When used in a CE mode, the FICE 
implementation estimates all the gradients using the DFU method, but the temporal shading 
variations are assumed to be zero. In this way, the best possible spatiotemporal gradients are 
used in the two implementations and the difference in performance is only due to the additional 
use of the shading information in the FICE implementation. 

Three types of experiments were run for the distant source case. The first experiment uses 
purely synthetic gradient data and demonstrates the ability of the system to correctly determine 
the solution under idealized conditions. The data of the first experiment are also used in the CE 
case, to show the bias that is obtained in the parameter estimates when the temporal shading 
variations are neglected. The second experiment is similar to the first one but uses synthetic, 
8 bit quantized irradiance data, from which the spatiotemporal gradients are estimated. The 
third experiment uses real data. 
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4.2.1.1 Synthetic Data 

The first experiment requires exact synthetic irradiance gradient data in order to evaluate the 
accuracy and the sensitivity of the nonlinear methods with respect to the initial estimates in the 
FICE case. The data are generated by analytically computing the irradiance spatial gradients in 
the image plane from the gradients of an analytical texture function p(a,0) on a mapping plane 
and from the position of a plane in space using the Jacobian of the transformation that maps 
the original mapping plane to the image plane (see section 3.3.1 for the relevant equations). The 
temporal gradients are computed directly from the FICE equation given the spatial gradients, 
the motion and structure parameters and the light source direction, that is, 

Etfrt) = -(v(p,t) •«) - (r -n)(s(r,t) -t) + p{r,t)[\,u>,h}. (4.15) 

The irradiance data E(r, t) are also computed analytically since they are required in the FICE 
formulation when the shading equation, which relates the irradiance values to the value of 
the texture on the surface, is used as a constraint. Figure 4.1(a) shows the irradiance data, 
figure 4.1(b), the temporal gradients and figures 4.1(c,d), the spatial gradients (E x and E y 
respectively). The texture used in the previous example is a sinusoidal grating with a sinusoidal 
phase variation. More specifically, the texture is specified by the expression 



E(x,t) = smlu x x+ (-sin(yy) J j 
E(y,t) = sm\w y y+(l«ii(^x) > )j 



In the first phase of the experiment, the analytically computed fields E(r,t), V T E(r,t) and 
Et are used directly as an input to a two-frame, non-DFU implementation of the FICE algo- 
rithm, and the structure and motion parameters evaluated with various numerical implemen- 
tations. The purposes of the experiment are to validate a given numerical implementation, to 
analyze its performance in terms of the initial estimate and number of iterations, and to demon- 
strate the validity of the algorithm, i.e. its ability to recover perfectly the parameters under 
these idealized conditions. Five numerical implementations were tried, three semilinear where 
the nonlinear equation was implemented with a minimization method (Levenberg-Marquardt), 
a direct method (Powell's hybrid) and an homotopy method, and two globally nonlinear meth- 
ods: Levenberg-Marquardt and Powell's hybrid. In general, the globally nonlinear methods 
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Figure 4.1: Cosine grating with sinusoidal phase variations on a slanted plane, (a) (top left) is 
the irradiance image, (b) (top right) the temporal gradient, (c-d) (middle left and right) the 
x- and y-gradients. 
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failed to converge to the correct solution or simply failed to converge for an arbitrary initial es- 
timate of all the parameters, unless the initial estimate was close enough to the correct solution, 
in which case the two methods perform equally well and converge to the correct solution within 
the specified tolerance (usually 10~ 6 or 10 -8 ). To insure convergence to the correct solution, 
the globally nonlinear methods require a fairly accurate estimate of all the parameters. Such an 
estimate can be obtained by the simpler CE implementation for an arbitrary surface orientation 
and rigid motion, or for specific initial values of the normal to the plane, the translation and 
the rotation. The initial estimate of the normal can be set to (0 1) (frontal plane) and zero 
for the translation and rotation vectors in situations where the slant and tilt of the plane is 
small (less than 10°) and the motion is small (less than .5-8° for the elementary rotations and 
less than .5 to 1 pixel translation in either directions). 

The semilinear methods are somewhat more forgiving as far as the initial estimate is con- 
cerned but are surprisingly slow due to the huge number of iterations required to converge to 
the correct solution. Up to 1000 iterations may be required to converge for a moderately tight 
tolerance 6 of 10 -6 , i.e. the relative variation between two successive estimates is less than S. A 
random initial value for the normal resulted in the convergence to the wrong solution in a few 
cases, although an initial estimate of (0 1) (frontal plane) never failed to converge to the right 
solution and was chosen as the default initial value unless a more specific value was determined 
by other means. The homotopy method converges to the correct estimates in far fewer itera- 
tions, but the saving in number of iterations is somewhat offset by the cost of each iteration 
(10 to 20 times the cost of a typical nonlinear implementation using Levenberg-Marquardt or 
Powell's). Table 4.1 shows the true motion and structure parameters as well as the initial value 
used for the simulation. Table 4.2 presents the results of a typical simulation using a semilinear 
method which implements the nonlinear equation with the Powell hybrid method, and table 4.3 
displays the results for a typical simulation using an homotopy method. 

In the second phase of the first experiment, the same synthetic data are run through a 
classical CE implementation of the algorithm. Table 4.4 shows the final results and relative 
errors corresponding to the parameters of table 4.1, and also displays the results for a case 
similar to the previous one with the exception of the rotation vector that is set to one-tenth 
of the previous value. As expected, both simulations produce biased estimates and the bias 
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True rotation in radians: 


u>x = .00698 


u> 2 = -.00524 


u>3 = .00873 


True rotation in degrees: 


u?i = .4 


a>2 = -.3 


U?3 = .5 


True translation: 


t x = .00781 


t 2 = .00469 


t 3 = -.01172 


True translation in pixels: 


ti = 1.0 


t 2 = .4 


t 3 = -1.5 


True normal: 


n x = .4663 


n 2 = -.2956 


n 3 = 1.0 


Orientation of plane: 


Slant = 15° 


tilt = 25° 




Initial guess for normal: 


ni = .0 


n 2 = .0 


n 3 = 1.0 



Table 4.1: Motion and structure parameters and initial values used in the planar experiments 
with synthetic data. 



Iter 




Rotation 






Translation 




Normal 




No 


U>! 


U>2 


c 3 


ti 


t 2 


t 3 


ni 


n 2 


n 3 


1 


.00929 


-.00192 


.01103 


.00326 


.00657 


-.01131 


.12152 


.01013 


1.0 


2 


.00888 


-.00220 


.01077 


.00349 


.00616 


-.01129 


.17112 


-.00552 


1.0 


3 


.00880 


-.00239 


.01064 


.00368 


.00617 


-.01139 


.19854 


-.01630 


1.0 


4 


.00876 


-.00254 


.01055 


.00385 


.00617 


-.01143 


.21689 


-.02486 


1.0 


5 


.00872 


-.00267 


.01049 


.00401 


.00617 


-.01146 


.23116 


-.03237 


1.0 


10 


.00858 


-.00316 


.01024 


.00464 


.00612 


-.01151 


.28042 


-.06353 


1.0 


20 


.00842 


-.00356 


.01001 


.00518 


.00603 


-.01153 


.31890 


-.09374 


1.0 


30 


.00819 


-.00401 


.00973 


.00584 


.00585 


-.01156 


.36072 


-.13369 


1.0 


40 


.00808 


-.00417 


.00961 


.00608 


.00576 


-.01158 


.37563 


-.15016 


1.0 


50 


.00790 


-.00443 


.00943 


.00647 


.00560 


-.01159 


.39817 


-.17782 


1.0 


100 


.00733 


-.00499 


.00896 


.00739 


.00505 


-.01166 


.44635 


-.25316 


1.0 


150 


.00716 


-.00512 


.00884 


.00761 


.00487 


-.01169 


.45698 


-.27452 


1.0 


200 


.00705 


-.00519 


.00877 


.00774 


.00476 


-.01171 


.46298 


-.28778 


1.0 


300 


.00699 


-.00523 


.00873 


.00780 


.00470 


-.01172 


.46585 


-.29452 


1.0 


400 


.00698 


-.00524 


.00873 


.00781 


.00469 


-.01172 


.46624 


-.29545 


1.0 


449 


.00698 


-.00524 


.00873 


.00781 


.00469 


-.01172 


.46629 


-.29556 


1.0 



Table 4.2: Evolution of the motion and structure parameter estimates for experiment one using 
FICE. Semi-linear implementation with a Powell hybrid method for the nonlinear equation. 
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Iter 




Rotation 






Translation 




Normal 




No 


wi 


U>2 


U>3 


ti 


t 2 


t 3 


ni 


n 2 


n 3 


1 


.00919 


-.00166 


.01108 


.00293 


.00648 


-.01119 


.04712 


-.16159 


1.0 


2 


.00824 


-.00211 


.01066 


.00352 


.00582 


-.01165 


.10701 


-.21627 


1.0 


3 


.00783 


-.00283 


.01027 


.00441 


.00548 


-.01175 


.18785 


-.23505 


1.0 


4 


.00765 


-.00369 


.00984 


.00549 


.00530 


-.01168 


.28642 


-.23649 


1.0 


5 


.00753 


-.00447 


.00942 


.00654 


.00515 


-.01157 


.37468 


-.23860 


1.0 


10 


.00705 


-.00522 


.00877 


.00778 


.00474 


-.01167 


.46232 


-.28711 


1.0 


15 


.00699 


-.00523 


.00873 


.00781 


.00469 


-.01171 


.46592 


-.29476 


1.0 


20 


.00698 


-.00524 


.00873 


.00781 


.00469 


-.01172 


.46626 


-.29552 


1.0 


25 


.00698 


-.00524 


.00873 


.00781 


.00469 


-.01172 


.46630 


-.29559 


1.0 


30 


.00698 


-.00524 


.00873 


.00781 


.00469 


-.01172 


.46630 


-.29560 


1.0 



Table 4.3: Evolution of the motion and structure parameter estimates for experiment one using 
FICE. Semi-linear implementation with an homotopy method for the nonlinear equation. 



increases with the magnitude of the rotation vector. The relative error in the rotation vector 
is higher for the smaller rotation, but it is not significant since the magnitude of the rotation 
is very small and the absolute error is barely 3 X 10 -5 radians, i.e. 1.7 X 10 -3 degrees. This 
experiment demonstrates that the FICE provide better performance in the case where the 
temporal variations of shading, albeit small, should not be neglected and the computational 
cost is justified. 

The second experiment relies only on the 8-bit quantized, synthetic irradiance values. The 
synthetic image is the same as the one used in the first set of experiments and shown in figure 4.1. 
In this experiment, the spatiotemporal gradients are estimated directly from the quantized 
irradiance data. On a gray level picture, the estimates of the gradients are virtually impossible 
to distinguish from the analytically computed gradients, but the effect of the quantization of 
the irradiance and of the defects due to the estimation process can be seen, for example, on 
the contour plot of the gradient field E x . Figure 4.2 shows the contour plot of the analytical 
gradient field, while figure 4.3 displays the contour plot of the estimated field. Small variations 
in levels and contour smoothness can be observed. These variations in turn affect the accuracy 
of the estimates of the motion and structure parameters. Table 4.5 recapitulates the final 
estimates of the parameters for the synthetic gradients and synthetic irradiance cases using the 
FICE and CE algorithms. As expected, the estimated parameters are not identical to the true 
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True rotation in degrees: u?i = .4, u>2 = —.3, 0*3 = .5 


CE algo. 


Rotation 


Translation 


Normal 


Final iter. 


.00700 -.00531 .00871 


.00794 .00467 -.01166 


.48020 -.2835 


1.0 


Rel err (%) 


.3 1.3 .2 


1.7 .4 .5 


3 4.2 


.0 


True rotation in degrees: u\ = .04, u>2 = —.03, u>3 = .05 


CE algo. 


Rotation 


Translation 


Normal 


Final iter. 


.00073 -.00054 .00087 


.00784 .00472 -.01172 


.46609 -.2927 


1.0 


Rel err (%) 


6.4 14 1 


.4 .6 .0 


.06 .1 


.0 



Table 4.4: Final estimates and relative errors of the motion and structure parameters using the 
CE for two values of the rotation vector. Other parameters are determined by table 4.1. 





Rotation 


Translation 


Normal 


True para. 


.00698 -.00524 .00873 


.00781 .00469 -.01172 


.4663 -.2956 1.0 


FICE est. 


.00702 -.00533 .00870 


.00798 .00466 -.01160 


.4852 -.2895 1.0 


Rel err (%) 


.6 2.1 .3 


2.2 .6 1.0 


4.0 4.1 0.0 


CE est. 


.00706 -.00552 .00867 


.00829 .00462 -.01149 


.5191 -0.2654 1.0 


Rel err (%) 


1.1 5.3 .7 


6.1 1.5 2.0 


11.3 10.2 0.0 



Table 4.5: Final estimates and relative errors of the motion and structure parameters using the 
FICE and CE on the synthetic quantized irradiance sequence depicted in figure 4.1. All the 
parameters are specified by table 4.1. 



ones, although the variations are small (on the order of 1-4%) and the bias between the results 
computed with the FICE and CE is about constant. This result is not surprising since the 
implementations of the FICE and CE are almost identical. 



This experiment demonstrates that the algorithm is robust in the case of noiseless, quantized 
data and very good performances are achieved in the estimation of the motion and structure 
parameters. No noise study of the two-frame implementation is performed in this chapter since 
a study of the performance of the general multi-frame algorithm with respect to noise is done 
in chapter 6. 
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Ex-256: h=256, v=256, min=-27.58, max=37.44 




f \ \ \ \ 
: «. \ \ » 



Figure 4.2: Contour plot of the analytically computed horizontal gradients of the lower right 
quadrant of the irradiance image shown in figure 4.1(a). 
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IEx-256: h=256, v=256, min=-27.70, max=37.45 




Figure 4.3: Contour plot of the estimated horizontal gradients of the lower right quadrant of 
the 8-bit quantized image shown in figure 4.1(a). The plot is the contour plot of the lower right 
quadrant of the horizontal gradients depicted in figure 4.1(c) 
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Camera parameters: F = 60mm, pixel size l/60mm 
Distance focal plane to object: Zq = 1250mm 




Rotation in degrees 


Translation in pixels 


Normal 


Measured parameters 


.0 .0 .5 


1.0 -1.0 .0 


.0 .0 


1.0 


Final estimates 


.031 -.081 .590 


.682 -.695 .0814 


.0542 -.034 


1.0 


Absolute rel. error (%) 


3.1 8.1 18.0 


31.8 30.5 8.1 


5.4 3.4 


0.0 


Corrected estimates 


.031 -.081 .59 


.859 -.836 .0814 


.0542 -.034 


1.0 


Absolute rel. error (%) 


3.1 8.1 18.0 


14.1 16.4 4.1 


5.4 3.4 


0.0 



Table 4.6: Final and adjusted estimates and corresponding relative errors of the motion and 
structure parameters using the multiframe DFU FICE on the real irradiance sequence shown 
in figure 4.4. 



4.2.1.2 Real Data 



The last set of experiments that were performed for the distant source case involves real data. 
The object used in these experments is a plane covered with a highly textured, matte piece of 
wall paper illuminated by a diffuse light source. Due to the primitive experimental set-up, only 
two degrees of freedom are available, a translation along the optical axis and a rotation around 
the optical axis. Two more degrees of freedom are generated, two translations parallel to the 
image plane, by synthetically shifting the irradiance images by an integer number of pixels in 
the x and y directions. 

The results of the experiment need to be taken with a grain of salt since all the measurements 
were done directly on the experimental set-up, and the rotation was performed manually and 
measured with a protractor. The real purpose of the experiment is to show that plausible 
results can be obtain with real images using the multi-frame DFU algorithm. Figure 4.4(a) 
shows the irradiance data, figure 4.4(b) the temporal gradient, and figure 4.4(c-d) the spatial 
gradients used in the experiment. The gradients are computed using the DFU method for 
the approximate motion parameters shown in table 4.6. Table 4.6 summarizes the measured 
parameters of the experiment and the computed motion and structure parameters. 

As expected, the computed and measured parameters are fairly different. Several factors can 
explain the discrepancies. The planar surface being imaged was not rigorously perpendicular 
to the optical axis, resulting in some extraneous, nonzero n x and rij, components as well as 
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Figure 4.4: Real data of matte piece of wallpaper on a frontal plane, (a) (top left) is the 
irradiance image, (b) (top right) the temporal gradient, (c-d) (middle left and right) the x- 
and y-gradients. 
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spurious u> x and u> y components. The value of the rotation around the optical axis is reasonable 
considering the crude measured effective rotation. On the other hand, the estimate of the 
translation is rather poor both in terms of the relative size of the each component and in terms 
of the overall scale factor that is determined by the measured distance Zq and the effective focal 
distance F of the camera. However, Zq is subject to measurement errors and the advertised 
value of the lens focal length (60 mm) was used; it is known that this differs substantially from 
the effective focal length. The anomalies in the x and y components of the translation vector 
can be explained by the fact that the center of rotation of the planar surface was not exactly 
coincident with the intersection of the plane with the optical axis; therefore, the measured 
translation is in fact the sum of the true translation, i.e. the translation obtained when the 
optical axis and the axis of rotation are coincident and two translation components in the plane 
perpendicular to the optical axis. This assumption was verified experimentally by running the 
algorithm on a set of data obtained from the same set-up but with only a rotation around the 
optical axis and no translation. The true center of rotation was estimated from the resulting 
optical flow and found to be approximately at the coordinates (13, 9) from the intersection 
of the optical axis with the plane. This position was confirmed by the measurement of the 
focus of expansion location on a sequence, where the motion was a pure translation along the 
optical axis. Another set of experiments were run with only synthetic translations in a plane 
parallel to the image plane in order to estimate the overall scale factor for the translation 
vector; the correction was found to be 1.08. The final estimates obtained during the original 
experiment were adjusted to take into account the induced translation provoked by the off- 
center center of rotation and the overall translation scale factor and the corrected values are 
also shown in table 4.6. The compensated values are far better than the raw estimates and are 
very respectable. 

4.2.2 Nearby Source 

The case of a planar Lambertian surface illuminated by a nearby source is richer than the previ- 
ous, distant source situation but is also much more complex to implement due to the analytical 
complexity of the expressions for the shading and temporal shading variations (equations (3.1) 
and (3.2)). The spatial irradiance gradients are produced in part by the spatial variations of 
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Figure 4.5: Exponentially damped cosine on a slanted plane, (a) (top left) irradiance image with 
far-away source, (b) (top right) irradiance image for close-up source, (c) (bottom) irradiance 
image for textureless surface. 
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Figure 4.6: Surface plot of the irradiance image of textureless surface shown in figure 4.5(a). 
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the texture and in part by the spatial variations of the shading across the surface as opposed to 
the distant case where the spatial irradiance gradients are only due to the texture. The spatial 
shading variations are induced by the proximity of the light source that is seen from a slightly 
different spatial position by each point on the surface. Figure 4.5(a) shows an exponentially 
damped zone plate texture illuminated by a distant source in direction 1, and figure 4.5(b) 
displays the same textured plane illuminated by a nearby source in the same direction. The 
difference between the two cases is best demonstrated by figure 4.5(c) that represents a texture- 
less plane, with the same geometry and position as before, illuminated with the same nearby 
source. Figure 4.5(c) is in fact the ratio of the irradiance of the planar surface illuminated by 
the nearby source by the irradiance of the same planar surface illuminated by the distant source. 
The perspective plot (figure 4.6) of the irradiance function depicted in figure 4.5(c) provides a 
more striking picture of the spatial shading variations across the surface. The reference level 
of the plot is the flat irradiance level that occurs in the distant source case. More specifically, 
the plot shows exactly the additive shading variations, across the surface, that are present in 
the nearby source case as opposed to the distant source case, i.e. it represents the difference 
between the irradiance produced by the nearby and distant sources. 

It can be observed that the spatial irradiance variations are very smooth and, as expected, 
the small spatial irradiance variations generate, in turn, extremely smooth spatial gradients (see 
figures 4.7(a) and 4.7(b) for the horizontal and vertical gradients respectively) with practically 
no contrast. These very small spatial gradients cause a lot of difficulty in determining the 
structure and motion parameters. The very smooth gradients result in a very slow convergence 
of the algorithm and can provoke instabilities. 

Table 4.7 shows the results of a typical run for a textureless planar surface with the structure 
and motion parameters described by table 4.1. These results were obtained for "near-perfect" 
data, i.e. the irradiance sequence was kept as a floating point array and the gradients were 
computed directly from the floating point sequence. Under these favorable conditions it takes 
about 16000 iterations to converge from a frontal plane initial estimate of the normal (n = 
(0 1)) and the convergence is excruciately slow as the figures in the tables show. 

Moderate success were obtained for the 8-bit quantized data with an arbitrary initial esti- 
mate of the normal. Convergence with relative error of 3-10% between the estimated parameters 
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Figure 4.7: Surface plot of the spatial irradiance gradients of the textureless surface shown in 
figure 4.5(a). (a) (top) horizontal gradient, (b) (bottom) vertical gradient. 
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Iter 




Rotation 




Translation 


] 


formal 


No 


Wl 


U>2 


U>3 


ti 


ta 


t 3 


ni 


n2 113 


1 


.01080 


.00062 


.01445 


-.00338 


.01583 


-.01379 


.00596 


-.00477 1.0 


2 


.01077 


.00057 


.01454 


-.00365 


.01633 


-.01378 


.00890 


-.00717 1.0 


3 


.01073 


.00052 


.01463 


-.00389 


.01680 


-.01378 


.01179 


-.00955 1.0 


4 


.01070 


.00048 


.01470 


-.00412 


.01723 


-.01377 


.01462 


-.01190 1.0 


5 


.01067 


.00043 


.01477 


-.00431 


.01762 


-.01377 


.01737 


-.01420 1.0 


10 


.01053 


.00023 


.01495 


-.00487 


.01887 


-.01376 


.02975 


-.02462 1.0 


25 


.01028 


-.00015 


.01477 


-.00443 


.01888 


-.01372 


.05351 


-.04490 1.0 


50 


.01006 


-.00040 


.01433 


-.00330 


.01746 


-.01370 


.07045 


-.06258 1.0 


100 


.00975 


-.00057 


.01392 


-.00230 


.01593 


-.01372 


.08060 


-.08389 1.0 


200 


.00932 


-.00063 


.01362 


-.00166 


.01463 


-.01381 


.08243 


-.11251 1.0 


300 


.00899 


-.00064 


.01347 


-.00140 


.01390 


-.01390 


.08074 


-.13369 1.0 


400 


.00872 


-.00063 


.01338 


-.00123 


.01336 


-.01396 


.07910 


-.15084 1.0 


500 


.00849 


-.00063 


.01330 


-.00109 


.01293 


-.01401 


.07795 


-.16535 1.0 


1000 


.00768 


-.00067 


.01303 


-.00063 


.01148 


-.01416 


.07796 


-.21711 1.0 


1500 


.00714 


-.00078 


.01283 


-.00028 


.01055 


-.01422 


.08368 


-.25237 1.0 


2000 


.00672 


-.00093 


.01265 


.00007 


.00984 


-.01425 


.09377 


-.28042 1.0 


3000 


.00602 


-.00151 


.01218 


.00099 


.00859 


-.01422 


.13367 


-.32902 1.0 


4000 


.00535 


-.00287 


.01122 


.00279 


.00722 


-.01392 


.23157 


-.38022 1.0 


5000 


.00516 


-.00433 


.01003 


.00477 


.00622 


-.01330 


.34746 


-.40355 1.0 


10000 


.00663 


-.00517 


.00889 


.00730 


.00503 


-.01201 


.45027 


-.31904 1.0 


11000 


.00675 


-.00520 


.00883 


.00748 


.00492 


-.01191 


.45587 


-.31125 1.0 


12000 


.00683 


-.00521 


.00879 


.00759 


.00484 


-.01185 


.45954 


-.30592 1.0 


13000 


.00688 


-.00522 


.00877 


.00767 


.00479 


-.01180 


.46194 


-.30234 1.0 


14000 


.00692 


-.00523 


.00875 


.00772 


.00475 


-.01177 


.46349 


-.29998 1.0 


15000 


.00694 


-.00523 


.00874 


.00775 


.00473 


-.01175 


.46449 


-.29844 1.0 


16000 


.00695 


-.00523 


.00874 


.00777 


.00471 


-.01174 


.46513 


-.29743 1.0 


16631 


.00696 


-.00523 


.00874 


.00778 


.00471 


-.01174 


.46542 


-.29699 1.0 



Table 4.7: Evolution of the motion and structure parameter estimates for a textureless planar 
surface using the FICE and a Powell hybrid implementation of the nonlinear equation. Motion 
and structure parameters are as specified in table 4.1. 
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and the true parameters was obtained for initial estimates relatively close to the true value of 
the normal, but no systematic way of determining the initial estimate was found. A classical CE 
implementation cannot be used, as in the previous section, to obtain a crude initial estimate, 
because such an implementation cannot deal with a textureless surface where all the variations 
are due to shading. 

The main reason for the near failure in determining the structure and motion parameters, 
in the textureless planar surface case, is that the shading variations are too weak, even for 
a very close light source, to generate sizable spatial gradients. The problem is easier, in this 
respect, for quadratic patches because the shading variations are much more pronounced due 
to the curvature of the surface and the spatial gradients have significantly more contrast. Un- 
fortunately, quadratic patches are far more complex to deal with, and only the distant source 
case can be reasonably dealt with. Chapter 5 presents an example of a textureless surface for 
a distant light source. 

4.2.3 Attenuated Lambertian Model 

The attenuated Lambertian model is closely related to the previously described nearby source 
model, with the addition that the irradiance is now proportional to the reciprocal of the square of 
the distance between the source and the planar patch. Such a model is usable for the estimation 
of the motion and structure parameters of a textureless surface, because the shading across the 
surface is not uniform. By comparison to the previous model, the spatial variations of the 
shading are stronger, and the spatial gradients have higher contrast because of the additional 
attenuation term that modulates the irradiance at each point. In fact, each point on the surface 
sees a light source in a slightly different spatial position and with a slightly different intensity. 
Figure 4.8(a) presents a planar surface with an attenuated Lambertian reflectance function and 
illuminated by a nearby source; figure 4.8(b) displays the same surface illuminated by the same 
light source but with a regular, nonattenuated, Lambertian surface and figure 4.8(c) shows the 
ratio of the irradiance of the attenuated Lambertian surface of figure 4.8(a) and the irradiance 
of the Lambertian surface of figure 4.8(b). As such, figure 4.8(c) corresponds to the irradiance 
of a textureless surface with an attenuated Lambertian reflectance and illuminated by the same 
nearby source. As expected, the spatial shading variations over the surface are more pronounced 
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than in the nearby case with a nonattenuated Lambertian reflectance (see figure 4.5(c) and 
figure 4.8(c)); the estimation should to be easier to carry out, i.e. the convergence rate should 
be higher and the choice of the initial estimate for the surface normal less critical than before. 

The general attenuated case is virtually intractable without the source-viewer approxima- 
tion, presented in section 3.2.2, resulting in the simplified system of equations (4.12), (4.13) and 
(4.14). Table 4.8 presents the results of a typical run for a textureless planar surface with the 
structure and motion parameters described by table 4.1. The experiment was run with "near- 
perfect" data, i.e. with synthetic floating point irradiance data. The convergence is about eight 
time faster than in the nearby case although still slower than comparable runs for a textured 
surface. The estimates of the solution tend to approach the neighborhood of the solution fairly 
fast and then the rate of convergence drops dramatically. These experiments suggest that faster 
convergence could be obtained with an implementation that switches numerical method once 
the estimated solution is close to the true solution. 

Table 4.9 displays the final results and relative errors for 8-bit quantized irradiance data. It 
was not necessary to provide an initial estimate that was very close to the true solution for these 
experiments, unlike the nonattenuated Lambertian nearby case, because the spatial gradients 
due to shading were strong enough to allow the algorithm to lock on the correct solution. The 
relative errors ranged from about .5% to 3% for the various components of the parameters. 

4.3 Summary of Results and Comments 

This chapter presented several implementations for different types of Lambertian surfaces and 
illumination conditions (nonattenuated, attenuated), different types of textures (gratings, con- 
stant, real data) and for various numerical methods in the planar patch case. It was found 
that the FICE performs better than its conventional CE counterpart in cases where the CE has 
been traditionally used (Lambertian surface illuminated by an infinitely distant punctual light 
source or hemispherical extended light source), and that the FICE implementation can recover 
the structure and motion parameters in the case of a textureless surface. 

The results of the simulations demonstrated the advantages of the FICE in providing extra 
accuracy in computing the parameters and in extending the range of cases that can be handled 
by the algorithm. Specifically, the FICE formulation can deal with weak surface markings 
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Figure 4.8: Exponentially damped cosine on a slanted plane, (a) (top left) irradiance image 
with close-up source and attenuated Lambertian reflectance function, (b) (top right) irradiance 
image for close-up source and regular Lambertian surface, (c) (bottom) irradiance image for 
textureless attenuated Lambertian surface. 
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Iter 




Rotation 




Translation 




Normal 


No 


<*>i 


U>2 


o, 3 


ti 


t 2 


t 3 


ni 


n2 n3 


1 


.01117 


.00100 


.01442 


-.00107 


.01251 


-.01378 


.01210 


-.00769 1.0 


2 


.01112 


.00091 


.01447 


-.00110 


.01269 


-.01376 


.01784 


-.01119 1.0 


3 


.01108 


.00081 


.01451 


-.00112 


.01284 


-.01375 


.02335 


-.01446 1.0 


4 


.01104 


.00073 


.01455 


-.00112 


.01298 


-.01374 


.02864 


-.01752 1.0 


5 


.01101 


.00064 


.01457 


-.00112 


.01309 


-.01372 


.03370 


-.02036 1.0 


10 


.01088 


.00029 


.01458 


-.00096 


.01338 


-.01365 


.05569 


-.03188 1.0 


50 


.01045 


-.00077 


.01372 


.00066 


.01231 


-.01340 


.12339 


-.07127 1.0 


100 


.00997 


-.00105 


.01327 


.00129 


.01131 


-.01343 


.13901 


-.10291 1.0 


200 


.00922 


-.00134 


.01283 


.00187 


.01013 


-.01353 


.15403 


-.14948 1.0 


300 


.00870 


-.00163 


.01247 


.00235 


.00930 


-.01355 


.17161 


-.18277 1.0 


400 


.00829 


-.00196 


.01213 


.00286 


.00862 


-.01351 


.19346 


-.20874 1.0 


500 


.00797 


-.00234 


.01175 


.00344 


.00800 


-.01343 


.22020 


-.23015 1.0 


1000 


.00707 


-.00461 


.00947 


.00680 


.00538 


-.01227 


.40387 


-.29016 1.0 


1500 


.00698 


-.00520 


.00878 


.00774 


.00473 


-.01176 


.46224 


-.29587 1.0 


2000 


.00698 


-.00523 


.00873 


.00781 


.00469 


-.01172 


.46607 


-.29565 1.0 


2047 


.00698 


-.00523 


.00873 


.00781 


.00469 


-.01172 


.46615 


-.29563 1.0 



Table 4.8: Evolution of the motion and structure parameter estimates for a textureless planar 
surface with an attenuated Lambertian reflectance function using the FICE and a Powell hybrid 
implementation of nonlinear equation. Motion and structure parameters are as specified in 
table 4.1. 





Rotation 


Translation 


Normal 


True para. 


.00698 -.00524 .00873 


.00781 .00469 -.01172 


.4663 -.2956 1.0 


Last est. 


.00694 -.00508 .00877 


.00757 .00472 -.01184 


.4798 -.2861 1.0 


Rel err (%) 


.6 3.0 .5 


3.1 .6 1.0 


2.9 3.2 .0 



Table 4.9: Final estimates and relative errors of the motion and structure parameters using 
the FICE on the synthetic quantized irradiance sequence depicted in figure 4.5(a). All the 
parameters are specified by table 4.1. 
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and textureless surfaces provided the spatial variations of the shading are strong enough to 
generate sufficient spatial gradients as in the nearby source and attenuated Lambertian cases. 
The major drawback of the FICE algorithm is its very high computational cost as well as its 
numerical implementation complexity. It is clear that such a formulation is not recommended in 
applications were only a reasonable estimate is sought since a much simpler CE implementation 
can provide the required accuracy in cases where the data are reasonably approximated by the 
constraint equation (see section 3.3.1). On the other hand, if the accuracy of the solution is 
the prime concern, the FICE implementation is a legitimate candidate. 

In addition to the use of the FICE, this chapter showed the advantage of the DFU in- 
cremental method to compute the motion parameters. Such a method enabled us to run the 
algorithm on real data acquired by a video camera and can be implemented very efficiently 
for rigid motion and optical flow applications independently of the type of constraint equation 
used. 



Chapter 5 



Quadratic Patch Estimation 



This chapter discusses the specific problems and forms of the general minimization equations, 
developed in chapter 2, when applied to quadratic patches. Specific implementations with 
specific shading models are derived and results of the algorithms on synthetic data presented. 
Geometrically, a quadratic patch is only slightly more complicated than a planar patch and 
is fully defined by the normal no at a given point Zo and its associated principal curvatures 
d Pl and d P2 , along the principal axes pi and P2; alternatively, the curvature tensor is repre- 
sented, for convenience, by the vector do at Zo (see figure 5.1). The implementation of the 
quadratic patch case for the CE is only marginally harder than the planar patch situation, 
and Negahdaripour (1986) proposed an implementation for the direct motion case. One of the 
major differences with the planar case is that no closed-form solution exists and the structure 
and motion parameters are estimated by an iterative procedure similar to the semilinear iter- 
ative procedure used for the planar patch case. However, in the FICE formulation, the task 
of estimating the parameters is an order of magnitude more complex, even in the case of the 
simplest distant punctual source. The complexity stems from the need to determine the nor- 
mal n and/or unit normal n at every point on the surface as a function of the normal no and 
curvature vector do, and to evaluate the shading at each point. Contrary to the planar case, 
spatial shading variations always occur, even in the distant source case, and its spatiotemporal 
variations need to be tracked in the FICE formulation. Figure 5.2(a) and figure 5.2(b) show 
two examples of shading caused by a distant punctual source for an elliptic hyperboloid and an 
ellipsoid respectively. The complexity of the formulation and therefore of the implementation 
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Figure 5.1: Quadratic patch geometry. 

limit the study of the quadratic patch case to the distant punctual light source case which can 
be solved for both textured and textureless surfaces. 

The next section presents the analytical formulation of the minimization equations in the 
distant source case and is followed by examples of runs on synthetic data. 



5.1 Implementation of the Quadratic Patch Case 

A quadratic patch is fully denned by two vectors, a normal no and a curvature(tensor) do at a 
single point on the surface, and the reciprocal of the depth Z can be expressed in terms of the 
image coordinates, r = (x,y,F), the normal no and the curvature do at Zo. Using a second 
order Taylor series of the reciprocal of the depth around the point Zo (see equation 2.22), 1/Z 
can be approximated by 

| = ^(r-no) + (q.d ) (5.1) 

where q = (^x 2 ,xy, \y 2 ) T is a quadratic coordinate vector, and d = —(Z xx , Z xy , Zyy) T is the 
curvature vector at the point Zo. It should be noted that, in the quadratic patch and higher 
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Figure 5.2: Irradiance image of an elliptic hyperboloid (a)(left) of an and ellipsoid (b)(right) 
illuminated by a distant punctual light source. 

dimensional patch cases, the reciprocal of the depth is only given by an approximate equation 
at a given order, unlike the planar case where 1/Z can be expressed exactly in terms of the 
surface coordinates. Equation (5.1) is a second-order approximation. 

The normal n and unit normal n at image point r can be expressed, to second order, by 



n = no + Zo(l - (p • no))Hr 



(5.2) 



and 



n = Ao + A(l - r • n ) (l 3 - non^Hr - ~^s (r T Hrfio + 2(fto • Hr)Hr) (5.3) 



l n 0|| 4||n 

where r = (* y 0) r , H = H 2 - 3HnofijH and 

/ -2 M -Z t 



H = 



xx a my " 

— Z X y —Zyy 



\ 0/ 

The details of the derivation that leads to the equations (5.2) and (5.3) can be found in ap- 
pendix E. The next section derives the quadratic patch minimization equations for a distant 
punctual source model. 
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5.1.1 General Equation for the Far-away Punctual Source Case 

Let us consider a distant punctual source in the direction 1 and examine the shading equation 
for a quadratic patch. Using the second-order Taylor series expansion of the unit normal n 
(equation 5.3), the distant shading equation takes the form 

E(r,t) = p((l • no) + Zol^Hr - Z (t . n )l^° - ^V((I • n )H + 2(H r nol*')H)r) (5.4) 

where Z = Z / ||no|| and lj° =!-(!• no)no. Equation 5.4 shows that the irradiance is a 
quadratic functional of the image coordinates r and includes constant and linear terms. If 
the light source direction 1 is parallel to the normal n , the previous equation takes the much 
simplified form 

E(r,t) = p(l - i£o 2 r T (H 2 + (HAo)(HAof)r) 

that is, the spatial irradiance variations are purely quadratic without any linear component, 
and the spatial gradients V r are linear, 

VE T = -/>Zo 2 (H 2 + (Hno)(Hno) r )r. 

Figure 5.3 shows the irradiance, horizontal and vertical gradients of an elliptic hyperboloid with 
a light source I parallel to the normal no of the tangent plane at the origin and figure 5.4 shows 
the irradiance, horizontal and vertical gradients of an elliptic hyperboloid with a light source 
not parallel to the normal no- The smoothness of the spatial shading gradients suggests that 
the convergence of the algorithm for a textureless surface is going to be very slow and that the 
global behavior of the algorithm is going to be similar to the nearby source, textureless, planar 
patch one. 

Let F(a>,t,no,d ) = E t - (v-u>) - (r-n )(s-t) - (q-d )(s-t) -p(r)[l,u>,n(r)], where n(r) 
is specified by equation (5.3). The unconstrained minimization equation can be written in the 
form 

C = mm JJ^F 2 + fi(r)(E(T,t) - p(r)(l • n(r))))dr + A(||no|| 2 - l). 

At an extremum of C, the derivatives of C with respect to the motion parameters u>, t and 
structure parameters no, do are zero, i.e. 

dC n 8C n 8C n J dC n 

^ =0 ' w= ' ^r ^ wr°- 
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Figure 5.3: Elliptic hyperboloid illuminated by a light source 1 parallel to the normal no of the 
tangent plane at the origin, (a) (top) irradiance image, (b) (bottom left) horizontal gradient, 
(c) (bottom right) vertical gradients. 
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Figure 5.4: Elliptic hyperboloid illuminated by a light source 1 not parallel to the normal n 
of the tangent plane at the origin, (a) (top) irradiance image, (b) (bottom left) horizontal 
gradient, (c) (bottom right) vertical gradients. 
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The resulting system of vectorial nonlinear equations, that defines the parameters t, u>, no and 
d , takes the form 

J J F(-v - p(r)(n X l))dr = (5.5) 

JJ F(-s(r. ho)- s(q. d Q )) dr = (5.6) 

JJ F{-q(s.t))dr = (5.7) 

//(F(-(s.t) r -p (r) (^)Vxl))-MrMr)(^) r !) dr + Aho = (5.8) 

where — - is computed using (5.3). The expression of this partial derivative is complex and 
can be found in appendix E along with its derivation. The Lagrange multiplier A and the 
Lagrange multiplier function fi(r) are eliminated by taking the dot product of equation (5.8) 
with the vectors no and 1 and solving the resulting scalar linear system in the unknowns A and 



11 



di 



H(r)p(r)dT. The expressions for the multipliers are very cumbersome, due to the - — term, 

oho 
and are omitted here. Once the multipliers are determined, their value is plugged back in (5.8) 

to obtain a system of four nonlinear vectorial equations in the unknowns u>, t, no and d . 

It should be noted that the previous vectorial equations are, formally, fairly similar to 

those of the system S obtained in the distant planar patch case. Equations (5.5) and (4.2) are 

formally identical, the new parameter d adds an extra term in (5.6) and gives rise to the new 

equation (5.7). However, the similarity between the two systems of equations in the planar and 

quadratic cases is deceptive. In fact, the unit normal n = n(r) is no more a constant vector 

but depends on the image coordinates r and the previous system of nonlinear equations is far 

more complex to implement than its planar patch counterpart. In particular, equation (5.8) is 

extremely hard to implement numerically and the scalar expressions are messy due, in part, to 

the lack of privileged axes of projection, found in the planar case. 

5.1.2 Solution of the Global Nonlinear System 

The previous system can be solved either as a global nonlinear system or as a semilinear system. 
The global nonlinear system was not implemented because the simpler planar case was already 
exhibiting fairly poor convergence properties, compared to semihnear implementation, and the 
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quadratic case is even more nonlinear and of higher dimension and its behavior can only be 
worse. 

The first equation is linear in u>, t and do, the second in u> and t, the third in u> and do and 
the last in do only. Two semilinear implementations can be considered. The global system can 
either be broken into a linear system in the unknowns u> and t with the first two equations and 
a nonlinear system in do and no with the last two equations or into a linear system in u? and do 
with the first and third equations and a nonlinear system in t and n with the second and fourth 
equations. The former implementation was chosen for two distinct reasons: its implementation 
is fairly close to the linear patch implementation, and it segregates the parameter estimation 
into motion parameters obtained from the linear system and structure parameters obtained 
from the nonlinear system. The linear system can be written in the form 

5 ;)(:)■(:)■ 

where Mi and ei are the same as in the planar case, with n replaced by fio, and 
N 2 = /^(r.n + q-do)((v + ^(r)(n xi))s r )<fr 

N 4 = // (r • n + q • d J ss r dr 

f 2 = E t (r-h + ti-do)sdr 

and the solution to the system (5.9) is given by 

t = (N 4 - NjM^NaJ-^NjM^e! - f 2 ) 

(5.10) 

u> = -Mj^d + N 2 t) 
The nonlinear part of the system was implemented with Powell's hybrid method. 

The next section presents two examples that illustrate the performance of the algorithm and 
compares the results to those given by a classical CE implementation in the case of a textured 
surface. 

5.2 Examples 

Two examples are presented in this section, a saddle surface with an exponentially damped 
zone plate synthetic texture and the same surface with a constant albedo (textureless surface). 



5.2. EXAMPLES 



141 



Rotation in radians: 


u>! = -.01047 


u> 2 = .00698 


a>3 = .00785 


Rotation in degrees: 


o>i = -.6 


U>2 = .4 


a>3 = .45 


Translation: 


t x = .00468 


t 2 = -.00625 


t 3 = .00895 


Translation in pixels: 


ti = .6 


t 2 = -.8 


t 3 = 1.1 


Normal to tangent plane at origin: 


ni = .3640 


n 2 = -.2851 


n 3 = 1.0 


Orientation of tangent plane: 


Slant = 15° 


tilt = 20° 




Curvatures at origin: 


d x = -1.0 


d 2 = 0.0 


d 3 = 1.0 



Table 5.1: Motion and structure parameter values used in the quadratic experiments with 
synthetic data. 



Figure 5.1 lists the motion and structure parameters used in the quadratic patch experiments. A 
large field of view of 90° is used in the experiments so that the second order terms are significant 
and the saddle surface does not appear similar to its tangent plane at the origin. The goal of 
these examples is to show the increased accuracy of the FICE algorithm over the traditional 
CE algorithms, in the case of textured surfaces, and to demonstrate the ability of the algorithm 
in recovering the motion and structure parameters for textureless quadratic surfaces. The data 
that are used are synthetic floating point data, i.e. the irradiance surface were raytraced as 
floating point arrays, and the spatial gradients are estimated directly from these arrays instead 
of the eight bit quantized irradiance images. Such data were used to assess the performance of 
the algorithm with respect to the convergence speed and the initial estimates choice without 
having to worry about the errors introduced by the quantization process. These experiments 
are meant to show the maximum performance that can be obtained with nonquantized data. 



5.2.1 Textured Surface 

As it was argued in section 5.1.2, only a semilinear implementation was used in the quadratic 
patch case with u> and t computed from a linear system of equations and fio and d from a 
nonlinear system of equations. In many cases, the algorithm failed to converge to a solution or 
to the correct solution for an arbitrary initial estimate of the structure parameters when the 
tangent plane of surface at the origin was highly slanted (r or a > 25°) or the curvatures were 
high. In the example used in this section, the tangent plane slant and tilt were small enough 
(r = 15 and a = 20°) so that an initial estimate of no = (0 1) (frontal tangent plane) was 
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In.it normal 
Init curvat 


ni = 0.0 
di = 0.0 


n 2 = 0.0 
d 2 = 0.0 


n 3 = 1.0 
d 3 = 0.0 




FICE 


True parameters 


Last estimates (1648) 


Relative error (%) 


Rotation 


-.00470 .00698 .00785 


-.00476 .00716 .00778 


1.34 2.61 


0.84 


Translation 


.00468 -.00625 .00859 


.00464 -.00537 .00869 


0.85 1.95 


1.21 


Normal 


.36400 -.28510 1.00000 


.37450 -.27760 1.00000 


2.91 2.61 


Curvature 


-1.00000 .00000 1.00000 


-1.01840 .05100 .09784 


1.84 0.51 


2.16 


CE 


True parameters 


Last estimates (1841) 


Relative error (%) 


Rotation 


-.00470 .00698 .00785 


-.00523 .00667 .00882 


11.4 14.5 


12.3 


Translation 


.00468 -.00625 .00859 


.00411 -.00711 .00982 


12.1 13.8 


14.4 


Normal 


.36400 -.28510 1.00000 


.30460 -.32800 1.00000 


16.7 15.4 


Curvature 


-1.00000 .00000 1.00000 


-1.17800 .12300 .83500 


17.8 12.3 


16.5 



Table 5.2: Final estimates and relative errors of the motion and structure parameters using the 
FICE and CE on the synthetic, floating point irradiance sequence depicted in figure 5.5. All 
the parameters are specified by table 5.1. 



adequate for all types of textures used. On the other hand, an initial null estimate for the 
curvature (planar surface) only works for sufficiently high contrast textures. Weak textures 
require a more accurate initial estimate for the curvature and d = (-.5 X .5) was found to 
be adequate for any texture mapped onto the surface specified by the parameters of table 5.1. 
Figure 5.5 shows the irradiance and spatio-temporal gradients of the quadratic patch, defined 
by table 5.1, mapped by an exponentially damped zone plate texture. Table 5.2 shows the 
real values, the initial and final estimates of the motion and structure parameters as well as 
the corresponding relative errors for two experiments using the irradiance data of figure 5.5(a). 
The two sets of results that appear in table 5.2 are for an implementation of the classical CE 
and for an implementation of the FICE. 

It does not come as a surprise that the number of iterations (1648) required to converge 
to the right solution is larger than the equivalent planar patch case since the problem is of 
higher dimension, 11 parameters to recover instead of 8, and the nonlinear part of the system 
is numerically far more complex. In addition, the cost of solving the nonlinear system for the 
structure parameters, no and d is about 14 times the cost of solving the nonlinear equation 
for the parameter no in the planar patch case. 

The accuracy of the estimates for the FICE experiment, less than 3% error, is very good 
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Figure 5.5: Exponentially damped zone plate texture for a saddle surface defined in table 5.1. 
(a) (top left) is the irradiance image, (b) (top right) the temporal gradient, (c-d) (bottom left 
and right) the x- and y-gradients. 
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considering that only the irradiance data were synthesized (in floating point accuracy) and the 
spatiotemporal gradients were estimated from these irradiance data. The results for the CE 
implementation have a much higher bias than in the planar patch case and the relative errors 
range from 11 to 18%. These results are not really surprising because the irradiance data for 
a quadratic patch exhibits a strong spatial shading, induced by the distant source, across its 
surface and the temporal shading variations are far more substantial than in the planar patch 
case. The strong shading effects are best seen on figure 5.6(a) for a textureless surface. 

Experiments with eight bit quantized data were not very impressive because of the strenuous 
requirement of providing a fairly good initial structure estimate in order to converge to the 
correct solution. In the case of the quadratic patch with a distant light source and quantized 
data, the CE implementation did not provide sufficiently good estimates, i.e. within 20 to 25% 
of the real values, except in the case of very strongly marked surfaces. Similarly, experiments 
with real data were inconclusive because the uncertainties on the true measured motion and 
structure parameters were so large, that the relative errors between the final estimates and the 
measured values of the parameters were totally meaningless. 

5.2.2 Textureless Surface 

In this experiment the same geometric saddle surface, defined by the parameters of table 5.1, 
is used but the albedo is now constant on the surface. Figure 5.6 shows the irradiance and 
the spatiotemporal gradients for the textureless surface. It can be observed that the shading 
spatial gradients are stronger for the textureless quadratic patch case than for the textureless 
planar patch case. This situation should stimulate a faster convergence to the correct solution 
and increase the region of convergence i.e. the algorithm for the quadratic patch case has 
the potential of being less sensitive to the initial estimate than the corresponding planar patch 
algorithm. Unfortunately, the example in the previous section demonstrates the need for strong 
spatial gradients to insure convergence for arbitrary initial values for the structure parameters 
and a textureless quadratic surface does not provide gradients with enough contrast. The 
example of figure 5.6 did not converge to the correct solution for the canonical initial values 
(no = (0 1) and do = 0) and a new initial estimate was selected. Table 5.3 displays the true 
parameters, the initial values and the final estimates and provides the relative errors between 
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Init normal 
Init curvat 


ni = 0.0 
d x = -0.5 


n 2 = 0.0 

d 2 = 0.0 


n 3 = 1.0 
d 3 = 0.5 


FICE 


True parameters 


Last estimates (5410) 


Relative error (%) 


Rotation 


-.00470 .00698 .00785 


-.00478 .00674 .00794 


1.78 3.41 1.15 


Translation 


.00468 -.00625 .00859 


.00462 -.00726 .00877 


1.31 1.62 2.12 


Normal 


.36400 -.28510 1.00000 


.37520 -.29950 1.00000 


3.10 2.98 


Curvature 


-1.00000 .00000 1.00000 


-1.02410 -.00112 1.02680 


2.41 1.12 2.68 



Table 5.3: Final estimates and relative errors of the motion and structure parameters using the 
FICE on the textureless synthetic, floating point irradiance sequence depicted in figure 5.6. All 
the parameters are specified by table 5.1. 



the true parameters and the final estimates. The errors are more or less identical to those 
obtained in the textured case but the number of iterations is more than three times the number 
of iterations required in the textured quadratic patch case. A strict comparison in number of 
iterations required by the planar and quadratic patch case, for a textureless surface, as opposed 
to a textured surface, is impossible because, in the former case, the same initial estimate is 
used in the textured and textureless surface situations and, in the quadratic case, a change of 
initial estimates was required. 

The results of the experiments with quantized data were very similar to the results of 
the experiments in the planar patch case: very good initial estimates are required to attain 
convergence to the correct solution. The problem of systematically finding the initial values 
is very difficult in the textureless case because the CE algorithm cannot handle textureless 
surfaces and is, therefore, unable to provide any useful solution. 

5.3 Conclusions 



The overall implementation of the quadratic patch case in the simplest possible shading case, 
the distant punctual source, turned out to be extremely hard and the convergence properties 
rather poor for a random initial estimate of the surface normal and quadrature vector and an 
arbitrary quadratic patch geometry. The simple, canonical initial estimate of a frontal plane 
i.e. no = (0 1) and do = failed in many cases unless the curvatures and inclination of the 
tangent plane at the origin were small or the surface markings very strong. The convergence 
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Figure 5.6: Textureless saddle surface defined by table 5.1. (a) (top left) is the irradiance image, 
(b) (top right) the temporal gradient, (c-d) (bottom left and right) the x- and y-gradients. 
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properties are worse than those of the planar patch case because the overall system is far more 
nonlinear and the number of unknows is higher. The nonlinear portion of the global system 
is no more a relatively simple nonlinear vectorial equation, but is now a full system of two 
vectorial equations, one of which (5.8) is very complex. The only guaranteed way of converging 
to the correct solution is to provide a fairly good initial estimate of the normal and curvature 
vectors. Negahdaripour's iterative scheme (Negahdaripour 1986), which solves the problem 
for the classical constraint equation, was used to obtain an initial estimate in cases where the 
texture gradients were strong and reasonable estimates could be computed using the CE scheme. 
On the other hand, very good results were obtained for both textured and textureless surfaces. 
In the textured cases, the FICE performances were much superior to the performances of the 
CE implementation and the relative error was two to three times smaller. Very nice results were 
also obtained in the textureless case, a case that cannot be handled by CE based algorithms. 

These implementation difficulties demonstrate that the quadratic patch case is really the 
most complex case that can be solved using this formalism. As it was hinted before, the 
implementation complexity grows an order of magnitude faster than the degree of the surface 
patch, leading very early on to intractable systems. The main source of complexity is due to 
difficulty in expressing analytically the unit normal at each point of the surface and in computing 
its derivatives with respect to the fixed parameters of the patch (e.g. no and do). On the other 
hand, it is possible to recover the structure and motion parameters for a textureless surface and 
the parameters obtained in the textured surface case are more accurate than those determined 
by an algorithm that relies on the CE. 
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Chapter 6 



Multiframe Formulation 



This chapter discusses the multiframe implementation of the DFU constraint equation intro- 
duced in chapter 2 and compares it to two alternate algorithms that rely on a more classical 
use of the regular or full irradiance constraint equation. The performances of the various multi- 
frame algorithms are examined with special emphasis on the relationship between the number 
of frames used, the amount of noise present in the irradiance sequence and the accuracy of the 
estimated results. The performances are evaluated for synthetic data, where noise of known 
variance is added and systematic comparison of performance is possible, and for a sequence of 
real data acquired with a video camera. 

The multiframe techniques described in this chapter do not rely on the use of shading 
information and can be utilized with any differential technique that uses a constraint equation. 
Moreover, shading information and multiple frames are two different techniques that can be 
used separately or together in a given implementation and the performance of each one does 
not affect the other one. The multiple-frame schemes that are presented are not only applicable 
to algorithms that compute the rigid body motion and structure but also to algorithms that 
estimate dense optical flow. The results are more dramatic in the former case because the 
highly overconstrained problem provides a better noise smoothing capability than in the case 
of the optical flow estimation that is a far less overconstrained problem. 

In this chapter, the minimization equations are presented for the full irradiance constraint 
equation with a distant light source model; the algorithms are run on data compatible with 
these assumptions. The real sequence, processed by the multiframe DFU incremental FICE, 
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was used in chapter 4 in the experiments with real data for a planar patch. The example will 
be further studied in this chapter to show how the accuracy of the estimated parameters varies 
when the number of frames used in the multiframe DFU algorithm is changed. 

6.1 Multiframe Algorithm Implementations 

The global goal in using multiple frames is to reduce the sensitivity of the algorithm to noise by 
using a larger amount of correlated data in the least-squares formulation. The key to this type 
of formulation is the derivation of a constraint equation, or of a series of constraint equations, 
that link the various frames so that a global parameter estimation can be performed on the 
augmented spatiotemporal block of data. 

Three different schemes are presented in this chapter: a central frame algorithm, an in- 
cremental constraint equation (ICE) scheme and the DFU algorithm that was introduced in 
chapter 2 in the two frames case. All three algorithms share the assumption that the rigid body 
motion is constant within the temporal analysis window. More specifically, the translational 
velocity t, the rate of rotation ||u>|| and the direction and position of axis of rotation are con- 
stant. Under this assumption the rotational and translational displacements of the rigid object 
in frame i, with respect to the initial frame 0, are expressed, for a unit time step, by u?W = ua 
and t(*) = it, respectively 1 . This assumption seems very restrictive but is in fact fairly benign 
for temporal windows of three to seven frames, in the case of ordinary motions. It will be shown 
later that the assumption can be somewhat relaxed to allow slowly varying motions from frame 
to frame. 

The next three sections present the three multiframe algorithms for recovering the motion 
and structure parameters and are followed by a study of the performance of some of the algo- 
rithms with respect to noise. In order to simplify the mathematical expressions, the algorithms 
which are derived for the planar patch case, instead of the generic depth map case, use the full 
irradiance with the distant source model, and assume that the surface is textureless. The algo- 
rithms can be adapted to the different light source and surface models that were presented in 
the previous chapters at the cost of a more complex formulation. In particular, textured surface 
algorithm implementations are obtained by the introduction of the shading model constraint 



'in this chaptei the parenthesized superscripts refer to the frame number. 
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via a Lagrange multiplier function that is eliminated in a manner similar to the one employed 
in section 4.1.1.1. 

6.1.1 Central Frame Algorithm 

The central frame algorithm is a straightforward extension of the regular two-frame algorithm. 
Let us assume that 2n + 1 frames, indexed from time t_ n to t n> are used and let us consider the 
In individual constraint equations CE{, i (E [— n, n] — {0}, between the reference frame and the 
frame i. Each constraint equation CE, relates the same reference frame to a given frame i in the 
sequence and the collection of constraint equations is globally minimized, or more specifically, 
the integral of the sum of the squares of all the 2n constraint equations is minimized. The choice 
of the reference frame is arbitrary and any of the 2n+l frames could be chosen. The advantages 
of the central frame as reference frame is that the symmetry introduces some simplifications 
in the expressions and the constant motion assumption is a better approximation since the 
maximal temporal distance between the reference frame and any other frame in the sequence 
within the temporal window is n. The minimization problem can be formulated as 



mm 



[Jl E (^ W -(v« (0 )-(r.ft)(..tW)_p|l > „(0 |ft ]) <fr + A(||n|| 2 -l)j. (6.1) 



To facilitate the expression of the summation, the k = term is included because E^ ' = and 

(j)(°) = t(°) = 0. Under the constant rigid body motion assumption, u^*) = iu> and t^ = it, and 

after some simple algebraic manipulations, it is apparent that this formulation and the resulting 

equations, obtained by differentiation of equation (6.1) with respect to the parameters t, u> and 

h, are similar to the one developed in section 4.1.1. For example, the equivalent linear system 

takes the form 

' M x M 2 \ / u, \ _ / e x 

Mf M 4 ) \ t / ~ \ e 2 

where Mi, M 2 and M4 are identical to the quantity of the same name in the system (4.6) and 



ei = 



Jl ( E aP/£, * 2 ) ( v + '(* >< I))*; *» = Jl ( E iE ?/i: i2 )(*- ft M r 



The overall effect of the central frame minimization is the averaging of the temporal gradients 
over the length of the temporal window. This result is not surprising because, by design, only 
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the spatial gradients of the central frames are used and all the other quantities, besides the 
temporal gradients, are independent of the frame indices. 

The only advantage of this formulation is its simplicity. The multiframe implementation 
is nearly identical to the two-frame implementation and the two-frame implementation is in 
fact the multiframe implementation for two frames. The drawbacks are numerous and serious. 
The formulation underuses the available data, the spatial gradients are only computed from the 
reference frame, and the rest of the sequence is merely used to compute the temporal gradients. 
This method only solves the noise problem in a weak sense because it only provides noise 
reduction by smoothing the gradient data in the temporal direction and it is totally ineffective 
in dealing with the noise present in the spatial gradients. In addition, for large motion, the 
smoothing of the temporal gradients can be detrimental to the convergence of the algorithm 
and worse results can be obtained in the multiframe case than in the two-frame case. This 
method was presented with the unique goal of showing how some quick improvements, over any 
two-frame algorithm, can be gained at the cost of very few minor changes to the two-frame 
implementation. 

The next section describes an algorithm, called the incremental constraint equations algo- 
rithm, that fully uses the spatiotemporal gradients of each frame in the sequence. However, the 
algorithm leads to a system of highly nonlinear equations that are very delicate to implement. 



6.1.2 Incremental Constraint Equation Algorithm 

Given n + 1 consecutive frames in a sequence, the ICE algorithm minimizes the integral of 
the sum of the squares of n constraint equations that link two adjacent frames at a time. 
Each constraint equation is a two-frame constraint equation, that relates the spatiotemporal 
gradients of two successive frames, and is, consequently, different from the constraint equation 
at neighboring pair of frames. Under the assumption of constant rigid body motion, the motion 
parameters and the intrinsic structure parameters (e.g. d ) are identical in each CE, but the 
normal to the planar patch, or to the tangent plane at the origin for higher order surfaces, is 
different. More specifically, in the distant source case, the minimization problem can be written 
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as 

T 1 [ C = Jl £ K* - ( v( ° •«)-(" n (i) )(s (,) • t) - p[U f *«>]) * - A(||n|| 2 - l)) , 

(6.2) 
where ft(*) = n + i(u> X n) and n = n(°) denotes the unit normal in the starting frame. In 
addition, the vector fields vW and sM are different in each frame i and are computed using the 
i th frame spatial gradients. 

Under the assumption of constant rigid body motion, this scheme amounts to performing 
n simultaneous two-frame minimizations to recover a common set of motion and structure 
parameters (e.g. u>, t and n in the planar patch case). Expanding the normal ft(*) in the 
minimization equation (6.2) and differentiating C with respect to the structure and motion 
parameters leads to the following system of three nonlinear equations 

jl l?(0 (_ V M + j( r x fi)( s (0 . t ) - p ((6 x 1) + in X (1 X u) - il X (u» X A)))dr = 

/7^)(-sW((r.n)4-[r,w,n]))dr = 
If \F® (-(s(') ■ t) (p + *(r x <*)) - p((l X «) - »(1 x«)x W ))jdr+An = 0, 

where FW(u^ t, n) = ^ - (vW • u0 - ((r • n) + i^ 

It can immediately be noticed that the previous equations are no longer linear in the motion 
parameter o>, resulting in the loss of the semilinearity of the overall system that can now only 
be solved by a general nonlinear global method. As a result, it should be expected that the 
convergence of the nonunear system is far from being guaranteed for an arbitrary set of initial 
conditions and an accurate initial condition is required. However, when restricted to two frames, 
the system becomes semilinear, its convergence properties are improved, and the solution can 
generally be found for an arbitrary initial condition. Therefore, the problem is first solved in 
the two-frame case and its solution is used as an initial estimate to the full multiframe system, 
resulting in a fast convergence to a more accurate solution. The convergence is fast because 
the initial value is usually a very good estimate of the final solution. 

The ICE algorithm fully uses all the irradiance data available in the sequence and provides 
a systematic way of computing the constant motion parameters and the structure parameters 
from a given set of frames in a sequence. Its drawbacks are twofold: the algorithm leads to a 
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fully nonlinear system of vectorial equations and the approach segments the temporal data into 
a set of n separate two-frame problems that are solved simultaneously instead of solving the 
(n + l)-frame problem directly. In this respect, the ICE algorithm methodology is similar to 
the one used by some nonlinear system solvers that decompose the structure of the system by 
considering each equation separately, and minimize the sum of the squares of the residuals of 
each equation instead of solving directly the initial system. 

The next section presents a method that offers more flexibility, globally uses all the data, 
and leads to a semilinear implementation of the system of nonlinear equations. 

6.1.3 Dynamical Frame Unwarping Constraint Equation 

Section 2.3.3 introduced the concept of the DFU constraint equation in the case of one virtual 
frame and two real frames. The DFU constraint equation algorithm relies on a dynamical CE 
that uses a displaced frame difference, computed for the current motion estimate, in lieu of 
the temporal gradients, and that minimizes the square of the residuals of the DFU constraint 
equation over the full image. The major difference with the classical method is that the con- 
straint equation, which is minimized at each iteration, is not static, but depends on the current 
estimate of the motion parameters; the spatial gradients are dynamically computed from the 
unwarped irradiance fields at each iteration. In effect, the DFU process takes the two irradiance 
fields and spatially distorts them so that a given point in the image plane is the image of the 
same object point at times t\ and t 2 « The unwarping process effectively undoes the motion to 
temporally align each object point so that it projects on the same image point throughout the 
sequence. 

It does not take much of a change to extend the unwarping process to an arbitrary number 
of frames. Let us define the global displaced frames difference, Z)G(xi,f,r,n,d t (xj,f)), as the 
sum of the 2n elementary DFDs formed by the n pairs of symmetric frames around the virtual 
frame 2 . Using the notations that were first introduced in section 2.3.3, the global DFD, Do, 
is given by 

n 

D G (3C-,t,T,n,a t (xi,t)) = £(fi(*i + (1 " &r)d t (xi,*),4 fc) ) - fifc - kr&t^t),^)) 

k-0 



In practice, the central frame is a virtual frame when the number of frames is an even number but is middle, 
real frame for odd number of frames. 
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where * ( _ fc) = t - krT* and t^ = t - (1 - kr)T*, and is used to derive the multiframe DFU-FICE 
expressed by 



A 3 (x i ,*,T,n,a<">) - (* (n) - d t ) ^ f I 3 - *r^L- j (V x u(xi + (1 - Jb-)d<"), f W) = £ (6.3) 

in the optical flow case. Equation (6.3) is the multiframe equivalent of equation (2.30) that was 
derived in the two-frame case. In the rigid body motion formulation, and in the simplified case 
where the motion field is locally constant, translational or slowly varying, equation (6.3) takes 
the simpler form 

D a (xi,t, r, n, d(">) - V G (xi, «+, r, n, d( n ))(u/ n > - «) + (r • n)S G ( Xi , t+,r, n, d( n >)(t("> -t) = E 

(6.4) 
where 

( n 

V G (xi,« + ,r,n,dW) = ^(x^l-rJdWfW) 

fe ^° • (6.5) 

S G (x i ,i + ,r,n,d(")) = £s( Xi ,(l-T)d("),*W) 

fc=o 

The multiframe equation (6.4) is identical to the two-frame equation (2.33), where the DFD is 

replaced by a global DFD, that is the sum of the individual DFD of a pair of frames symmetric 

about the virtual frame, and the v and s vector fields are replaced by the global V<j and Sq fields 

that are the sum of the individual v and s fields. In this respect, the multiframe formulation is 

formally identical to the two-frame algorithm and all the results and implementations described 

in the preceding chapters are directly applicable. 

The DFU constraint equation provides a uniform way of dealing with sequences of frames 

by dynamically unwarping the individual frames of the sequence to temporally align them, 

stack them into a spatiotemporal block of data and estimate, in one step, the motion and 

structure parameters of the full block of data. The elegance of the DFU algorithm is that the 

two-frame case is effectively the general case, and only modest modifications are required to 

deal with an arbitrary number of frames. The main drawback of the multiframe method is its 

higher computational cost, caused in part by the computation of the unwarped gradients at 

each iteration. Section 2.3.3 provided a more thorough discussion of the computational cost of 

the method. 



156 CHAPTER 6. MULTIFRAME FORMULATION 

In the practical implementation of the multiframe algorithm, it is advisable to start the 
iterations with only two frames and increase the number of frames only when a decent estimate 
of the motion parameters is available. In fact, if no a priori estimate of the motion is known, 
and the iterations are started by assuming a zero motion (the most probable alternative), it is a 
mistake to stack several raw frames at once, especially if the motion is large. The computation of 
the spatial gradients that uses the full spatio-temporal analysis window is then heavily distorted; 
the computed gradients are grossly inaccurate and can inhibit the convergence of the algorithm 
to the correct solution. In practice, the multiframe algorithm is run as a two-frame algorithm 
up to a point where the difference between two successive estimates of the parameters is below 
a predefined threshold, and then the full sequence, within the temporal window, is unwarped 
and the full multiframe algorithm enabled 3 . 

6.1.4 Slowly Changing Motion 

All the algorithms described in the previous sections were derived under the constant rigid 
body motion assumption and a unique set of motion parameters are computed for the set of 
frames belonging the temporal analysis window. This assumption can be somewhat relaxed and 
the algorithms modified to take into account slowly varying motion where the translation and 
rotation vectors can be thought of as made out of a root term, which represents the velocity at 
time t, and a first-order update term, which represents the incremental portion of the velocity 
vector between the frame at time t and the frame at time t + St. 

At a given time t the motion parameters are computed, using the multiframe algorithm, by 
assuming that the motion is constant within the spatiotemporal window of analysis. At the next 
time step, t+St, the spatiotemporal window is moved by St, and the new motion parameters are 
computed using the previous estimate as initial estimate. This process is particularly well suited 
to the DFU incremental FICE, because the incremental formulation only computes the update 
term and the computation can be performed very efficiently for small increments. In summary, 
slowly varying motion parameters are accommodated by assuming that the parameters are 
constant within the analysis window at a given time, and are changing slightly as the analysis 



In reality, as it was pointed out earlier, the unwarped sequence is never computed but the indices of each 
temporal arrays are shifted by the amount of estimated motion in each frame and the unwarping process is a 
mere computation of offset indices in the various arrays. 
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window is displaced temporally; an update of the motion parameters is computed at the next 
time frame. 

The next section presents synthetic and real data examples of multiframe parameter estima- 
tions. The emphasis of the section is the study of the change in the accuracy of the estimated 
parameters as the number of frames and the noise are varied. 

6.2 Examples 

Multiframe processing is a way to increase the robustness and accuracy of the solution by 
providing additional redundancy to the algorithm. Under the assumption that the noise in the 
irradiance sequence is a zero-mean, additive process uncorrelated with the signal, the accuracy 
of the estimated parameters is a monotonous increasing function of the number of frames; the 
error between the true and estimated parameters approaches zero as the number of frames 
increases (at least in the FICE case, where no bias is present due to the use of the constraint 
equation on data which do not obey it). 

These results are confirmed for synthetic images where the rigid body motion is truly con- 
stant and the noise is moderate (10 to 15 %). For high noise level, the computation of the 
gradients is too inaccurate and the algorithm can either diverge or converge to the wrong 
solution. The mode of failure and susceptibility to noise is different for the ICE and DFU 
algorithms. The ICE algorithm is very sensitive to noise, because of its high degree of nonlin- 
earity, and small amounts of noise are enough to prevent convergence to the correct solution. 
The DFU algorithm is much more robust in the presence of noise. However, the two-frame 
algorithm is fired first and a high amount of noise can make it diverge or prevent it from locking 
on the correct solution, in which case the multiframe algorithm cannot recover from the initial 
incorrect unwarping of the sequence and does not converge either. It is paradoxical that the 
multiframe algorithm behavior in the presence of noise is conditioned by the behavior in the 
two-frame algorithm, but the multiframe DFU algorithm can rarely be started directly, except 
in cases where the motion is moderate, because the stacking of many nontemporally aligned 
frames can prevent it from converging to the right solution. 

In the case of real data, the improvements in accuracy are only observed up to a certain 
point, generally five to seven frames for regular motions, and then the performance degrades. 
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The problem with real data is that the assumption of constant motion, within the temporal 
analysis window, is only a reasonable approximation for a small number of frames; when too 
many frames are used, the assumption breaks down and the results rapidly get worse. It was 
found experimentally that the maximum reasonable length of a centered temporal window is 
about seven frames, i.e. three frames before and after the frame under consideration. However, 
most of the time, five frames were enough to converge to the true solution with sufficient 
accuracy. 

Three experiments are presented in the next sections: the first one uses the central frame 
scheme, the last two the DFU algorithm. The first experiment demonstrates the limits of ap- 
plicability of the central frame algorithm; the second presents the results of the DFU algorithm 
for a synthetically generated sequence, in terms of the number of frames and of the amount of 
noise, and the last experiment shows the behavior of the DFU algorithm for real data. 

6.2.1 Central Frame Algorithm Example 

By design, this very simple scheme can only improve the accuracy of the estimated parameters 
on sequences where the main source of noise is in the temporal gradients and little can be 
achieved in terms of lowering the error in the estimates, caused by noisy spatial gradients, by 
using more frames. The noise in the spatial gradients is not averaged by this algorithm because 
only the central frame is used to compute the spatial gradients. In fact, when the noise is 
concentrated on the spatial gradients, the error in the estimates is approximately constant and 
independent of the number of frames used. The bias in the solution is caused by the noise 
in E T . When both the spatial and temporal gradients are noisy, the accuracy of estimates 
increases with the number of frames and eventually levels off at a level that represents the bias 
introduced by the noisy spatial gradients. 

The data in this experiment are totally synthetic: the spatial gradient of a planar patch 
mapped with a cosine grating with a sinusoidal phase variation (see figure 4.1) is analytically 
generated and the temporal gradients are computed directly from the distant source FICE. 
The true motion parameters are u> = (1.3, — .1, — 1.2)10 -2 radians, and t = (.25, .5, 1.25)10 -2 , 
and the unit normal is n = (.0371, -.0371,-9278). These ideal gradient data are selectively 
corrupted by 5% additive noise in three different ways — the temporal irradiance gradients Et 
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are noisy and the spatial gradients E T noise-free — the temporal gradients are noiseless and 
E T noisy — both E t and E r are noisy. The spatiotemporal gradients are directly input to the 
central frame algorithm. 

Table 6.1 displays the relative errors, expressed in percentages, between the true and the 
estimated parameters for multiframe runs ranging from two to six frames. As expected, the 
accuracy of the estimates increases with the number of frames in the case of noisy temporal 
gradients; the relative error reaches about 10 -3 for all the parameters when six frames are used. 
The performance of the algorithm is independent of the number of frames in the case where 
only the spatial gradients are noisy, and the computed solution exhibits a relative bias of about 
.5% at convergence. In the case of noisy spatiotemporal gradients, the accuracy of estimates 
initially increases with the number of frames then reaches a constant level, the bias level of the 
previous case, in about five frames. 

The analysis of these results clearly shows the strong limitation of the algorithm: only 
temporal gradient noise can be reduced by the central frame algorithm; spatial noise is not 
averaged but is translated into a bias in the final estimates of the motion and structure pa- 
rameters. On the other hand, the results clearly indicate the gains that are achieved by the 
multiframe formulation and prove that substantial performance improvement is available from 
these multiple-frame techniques. 

6.2.2 DFU Algorithm: Synthetic Data 

This experiment focuses on the measure of the accuracy of the estimates as a function of the 
number of frames and of the noise level. The irradiance data (see figure 4.1) are generated in 
floating point, then zero-mean, noncorrelated noise is added and the noisy data are quantized 
to eight bits. The synthetic sequence is generated from a perfect constant rigid body motion, 
that is given in table 4.1, along with the structure parameters. The irradiance sequence is the 
only input to the DFU algorithm and the spatial gradients are dynamically estimated from the 
unwarped, quantized irradiance frames. 

In this experiment, the multiframe algorithm is started as a two-frame algorithm and runs 
as such until the relative error between two successive estimates is less than 20%, at which point 
it is switched to the multiframe mode. The 20% threshold is rather arbitrary. It was determined 
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0.38 0.38 


0.01 
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7.83 7.86 7.30 


3 


1.27 1.27 
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1.43 1.43 1.59 
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0.54 


0.63 1.01 0.64 


0.94 0.95 1.40 
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0.33 0.33 


0.36 


0.45 0.34 0.32 


0.54 0.53 1.52 
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0.32 0.33 


0.27 


0.51 0.98 0.38 


0.41 0.42 1.57 



Table 6.1: Relative errors, expressed in percentage, of the motion and structure parameters 
as a function of the number of frames for the noisy synthetic irradiance sequence depicted in 
figure 4.1 processed by the central frame algorithm. 
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by trial and error and represents a safe level at which the unwarping of the full sequence by 
the current motion estimate does not prevent the algorithm converging to the correct solution. 
Three levels of noise were selected — 1%, 5% and 10%. Table 6.2 reports the results, which are 
expressed in percentages of relative error between the final estimates and the true value of the 
parameters, for the three noise levels and for a variable number of frames (2-7). Globally, it 
can be observed that the accuracy steadily increases with the number of frames and that this 
increase is particularly dramatic for the noisiest sequence, where the initial gain is the largest. 
In general, the gain in accuracy obtained from switching from two frames to three is much 
bigger than the incremental gain achieved by adding additional frames. The results suggest 
that, for a large number of frames, the errors is going to zero. Unfortunately, in most cases, 
the motion constancy assumption is met for at most seven frames; it should not be expected 
that the error will be zero for seven frames in the case of moderately noisy data. 

This experiment shows that the overall multiframe DFU algorithm is very robust against 
noise and that excellent performance can be achieved in cases where the rigid body motion 
is constant. However, we should be aware that the results on the accuracy of the estimated 
parameters obtained with these synthetic data represents an upper bound on the gain in perfor- 
mances that can be achieved for noisy quantized data. Although the DFU algorithm was run on 
the synthetic data in a realistic way, i.e. the noisy, quantized irradiance data were fed directly 
to the algorithm, these data are still ideal. In particular, the motion is rigorously constant, the 
shading is analytically computed and the added noise is very controlled. Real motion is rarely 
constant; even for a short temporal analysis window, the shading equation is, at best, a decent 
approximation of the real situation and the noise process is neither truly zero-mean, nor purely 
white and additive and worst results should be expected in the real data case. 

The next section presents the results of the same algorithm applied to real data acquired 
by a video camera. 

6.2.3 DFU Algorithm: Real Data 

The data in this experiment are those used and described in section 4.2.1.2. In particular, it 
should be recalled that the measured parameters from the set-up were not very accurate and 
that the final estimates, presented in table 4.6, were ultimately corrected for errors in the focal 
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5% noise in irradiance image 


# frames 


Rel. rotation error 


Rel. translation error 


Rel. 


normal error 


2 


1.41 3.99 1.31 


3.67 1.84 2.45 


6.51 


6.49 


3 


0.62 1.42 0.51 


1.45 0.72 1.04 


2.42 


2.36 
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0.29 0.61 0.32 


0.84 0.29 0.67 


1.03 


1.07 
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0.14 0.39 0.21 


0.52 0.12 0.41 


0.67 


0.69 
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0.08 0.21 0.13 


0.34 0.09 0.29 


0.31 


0.32 
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0.06 0.10 0.08 


0.15 0.07 0.19 


0.17 


0.18 



10% noise in irradiance image 


# frames 


Rel. rotation error 


Rel. translation error 


Rel. 


normal error 


2 


2.64 5.75 2.47 


6.12 3.42 4.89 


8.69 


8.71 


3 


1.04 2.12 1.09 


2.90 1.42 2.17 


3.12 


3.10 


4 


0.63 1.31 0.67 


1.72 0.80 1.19 


2.15 


2.13 


5 


0.41 0.77 0.41 


0.85 0.57 0.69 


1.41 


1.42 


6 


0.30 0.48 0.27 


0.49 0.30 0.37 


0.83 


0.84 


7 


0.19 0.28 0.15 


0.28 0.17 0.17 


0.41 


0.41 



Table 6.2: Relative errors, expressed in percentages, of the motion and structure parameters 
as a function of the number of frames for the noisy synthetic irradiance sequence depicted in 
figure 4.1 processed by the DFU incremental FICE algorithm. The true motion and structure 
parameters are given by table 4.1. 
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# frames 


Rel. rotation error 


Rel. translation error 


Rel. 


normal error 


Rel. avg. error 


2 


11.4 26.6 46.1 


39.7 43.5 13.2 


18.1 


16.2 


26.9 


3 


6.2 14.1 25.9 


21.3 22.7 7.1 


11.3 


9.4 


14.7 


4 


4.1 12.7 22.7 


18.8 19.1 5.9 


7.7 


5.2 


12.0 


5 


3.1 8.1 18.0 


14.1 16.4 4.1 


5.4 


3.4 


9.1 



Table 6.3: Adjusted relative errors, expressed in percentages, of the motion and structure 
parameters as a function of the number of frames for the real irradiance sequence shown in 
figure 4.4, processed by the DFU incremental FICE algorithm. The true motion and structure 
parameters are described in table 4.6. 



length of the camera, the distance between the projection center and the object and the position 
of center of rotation. As such, less attention should be focussed on the actual individual values 
of relative error and more on the global trend that is given by the average relative size of the 
errors as a function of the number of frames. In the case of real data, the exact relative error 
figures for the various parameters are nearly meaningless because the true parameters are not 
known with enough precision to perform a direct comparison with the actual parameters, but 
the performance of the algorithm can still be judged by the average relative accuracy of the 
algorithm with respect to the number of frames used in the estimation. 

Table 6.3 presents the adjusted relative errors between the measured and estimated param- 
eters, i.e. the relative error in the parameters once the correction to these final estimates is 
computed. The improvement in performance is dramatic and the average relative error drops 
from 27% in the two-frame case to about 9% in the five frame case. This example illustrates 
the necessity of a multiframe algorithm when dealing with real data as opposed to synthetic 
data or synthetic gradients data and clearly demonstrates the substantial gains in accuracy that 
can be achieved. It is fair to state that most two-frame algorithms that are described in the 
literature and only tested for synthetic gradient data could not be used directly for real data. 

6.3 Conclusions 



This chapter presented several multiframes techniques that are, intrinsically, independent of 
the type of constraint equation used (CE or FICE) and are applicable to rigid body motion 
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and structure estimation or to optical flow computation problems. 

The most attractive method was found to be the DFU incremental algorithm; because it 
can estimate all the motion and structure parameters of a full block of spatiotemporal data at 
once, it leads to an efficient, semilinear implementation; its complexity is independent of the 
number of frames and it can very efficiently compute the incremental motion update terms to 
deal with slowly varying motion parameters. 

The most severe restriction in these algorithms is the assumption of constant rigid body 
motion within the temporal analysis window; in practice, this assumption limits the useful 
number of simultaneous frames to five or seven. Experimental results on synthetic data suggest 
that, for moderate amount of noise, more than five to seven frames are needed fro the algorithm 
to converge to the exact solution. The underlying limitation of these multiple-frame algorithms 
is that they use a small number of frames at a time, i.e. the memory of the system is limited in 
practice to five to seven frames. A state variable approach, like the Kalman filtering approach, 
does not have these restrictions and might be more appropriate for dealing systematically with 
multiple frames. 



Chapter 7 



Summary and Conclusions 



This thesis addresses the problem of recovering the shape and motion of patches moving with 
respect to a fixed camera and fixed fight source. The primary goal is to recover the motion and 
structure parameters of a moving patch directly from the time-varying image sequence using 
motion and shading cues without either computing the optical flow or establishing correspon- 
dences between features in the image sequence. The primary goal consists of three subgoals, 
the use of shading information, the relevance of shading information, and the use of multiple 
frames. 

This thesis establishes a theoretical and computational framework for incorporating a priori 
knowledge of shading conditions and for using it, in conjunction with motion cues, to compute 
the motion and structure parameters of a generic patch. More specifically, a class of constraint 
equations that link the spatiotemporal gradients of the irradiance of a sequence of images to the 
temporal variations of shading is derived for arbitrary depth functions. The generic surfaces 
are specialized to polynomial patches in the minimization equations in order to express the 
depth functions in terms of a few structure parameters that fully define the patches. Further 
specializations to planar and quadratic patches are carried out later in the implementation 
examples. 

The advantage of the new class of constraint equations is twofold: it provides a more accurate 
solution to problems where surface irradiance changes are due to motion, and it provides a 
solution for textureless surfaces. Under the classical CE, which assumes that the irradiance of 
a a given point does not change with motion, only highly textured surfaces can be used, and 
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the solution is only approximate in cases where the classical CE is not a good approximation. 

The minimization equations are derived in a wide variety of cases and a specific imple- 
mentable set of equations is given for different combinations of albedo (arbitrary and constant), 
source types (collimated and extended), source positions (distant or nearby), surface geometry 
(planar and quadratic) and irradiance models (general Lambertian and attenuated Lamber- 
tian models). The minimization equations form a system of three or more vectorial nonlinear 
equations in the motion and surface parameters. Several semilinear and globally nonlinear pro- 
cedures are described and their relative performances discussed. Perfect results, in terms of 
the accuracy, are obtained in the cases of ideal synthetic data for both textured and texture- 
less surfaces. These results also demonstrate that the new CE performs substantially better 
than implementations based on the classical CE, in cases where the classical CE is only an 
approximation, and that the new method is able to recover the motion and structure param- 
eters for textureless surfaces, a case that cannot be solved by implementations of the classical 
CE. Very good results are obtained for quantized synthetic images from which the spatiotem- 
poral gradients are estimated, and promising performance was achieved for real data, although 
it was impossible to rigorously assess the accuracy of the results due to the uncertainty in 
measurements of the experimental parameters. 

The new results are not obtained at a trivial computational cost. The formulation is over an 
order of magnitude more complicated than the classical CE case and no closed-form solution 
exists even in the planar case. The convergence is typically very slow, requiring as many as 
16000 iterations for textureless surfaces, and the system does not automatically converge to 
the correct solution given an arbitrary initial value. However, those characteristics are shared 
with iterative implementations of the CE, although the tolerance is tighter in the FICE case 
due to the high order of the nonlinearities. As a result, and as a consequence of the much 
higher complexity of the implementation and much higher computational cost, the full FICE 
implementation should only be used in cases where it is critical to determine the most accurate 
estimate possible or when the surface markings are too weak for reliable use of a less expensive 
method. In particular, this method is inappropriate in a feedback loop system, where only an 
approximate solution is required, and the additional computational burden is not justified. 

Secondly, the thesis discusses, qualitatively and quantitatively, the importance and relevance 
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of the shading information in the case of an object moving with respect to a light source. In 
particular, the CE is evaluated analytically to determine its validity and the error as the texture 
scale and the magnitude of rigid motion vary. The approximation was found to worsen as the 
rotational component of the motion increased, even when the surface markings are strong and 
there is high contrast at high frequencies. The classical CE is a very good approximation 
for highly textured surfaces in the passive navigation case, because the objects are fixed with 
respect to the light sources and there is no motion induced shading. On the other hand, it is, 
at best, only a fair approximation in cases where the object moves with respect to the camera, 
especially for motion where the rotational motion is substantial. 

Thirdly, the thesis proposes an efficient multiple-frame algorithm that allows the global esti- 
mation of the motion and structure from a block of frames. The task is achieved by considering 
a constraint equation which directly finks a block of frames by stacking the unwarped frames 
into a single spatiotemporal block of data and by globally minimizing the data. The method 
assumes that the rigid motion is constant within the spatiotemporal block, but this assumption 
can be relaxed to allow for situations where the motion is varying slowly. The DFU multiframe 
implementation of the FICE is found to be very efficient, because a lot of the quantities required 
to estimate the spatiotemporal gradients can be precomputed for each frame and the unwarping 
operation simply amounts to temporally summing the offset frames of coefficients (the offset- 
ting or unwarping operations are carried out directly on the arrays, by index calculations, at 
the time of the summation). The experiments show that a substantial gain in performance is 
obtained by the multiple-frame algorithm when dealing with noisy irradiance sequences and 
further demonstrate that only a multiframe method is viable for estimating the motion and 
structure parameters of real data. This highly overconstrained nature of the problem allows for 
a much more robust solution and provides for the required noise reduction. 

A key assumption in the multiframe algorithm is the assumption of constant motion within 
the spatiotemporal analysis window. For all practical purposes, this assumption limits the 
number of frames to perhaps five or seven which may at times not be enough for very noisy 
sequences in order to sufficiently neutralize the ill effects of the noise. An alternate and more 
desirable solution is to summarize the entire past of the sequence by state vectors that are 
updated when a new frame is added to the sequence. The Kalman filtering approach provides 



168 CHAPTER 7. SUMMARY AND CONCLUSIONS 

such a framework and is under investigation by other researchers. 

One important issue that has not been addressed in this thesis is the problem of solution 
uniqueness. It has been shown (Negahdaripour 1986, Negahdaripour 1987) that a motion 
vision problem can have at most three solutions; only planar surfaces and quadratic surfaces 
with negative or zero Gaussian curvature can give rise to ambiguity, and then only under very 
special circumstances. The introduction of the shading component does not affect the multiple 
solution findings and was therefore not discussed in this thesis. 

Another issue that was left aside is the image segmentation problem. In many situations, 
the scene consists of both stationary and independently moving objects and it is necessary to 
segment the image into regions of similar velocity before attempting to recover the motion and 
structure parameters. The segmentation and motion estimation processes can be simultaneous, 
as in the procedure adopted by Adiv (1985), or it can be two separate stages in which the 
segmentation and motion and structure parameters processes are independent. This issue is 
totally independent of the shading problem and is found, in exactly the same terms, in tra- 
ditional motion estimation that does rely on the classical CE. Since this thesis concentrates 
on developing a new class of algorithms, problems common with earlier approaches were not 
examined in detail. 

In conclusion, we develop a new generalized multiframe constraint equation that combines 
shading and motion and provides implementation for various cases. The solutions that are 
obtained are more accurate than the solutions based on the use of the classical CE and the 
new algorithms allow the computation of the motion and structure parameters for textured and 
textureless surfaces. The study made apparent the complexity of formulations that use both 
the shading and motion cues and that lead to systems of vectorial nonlinear equations which are 
difficult to implement. This complexity is the price of more general and more accurate solutions. 
In addition to new algorithms, the thesis carefully examines the classical CE approximation and 
improves the understanding of the tradeoffs in its use. The practical choice might be to stay 
with less expensive, less accurate methods, but it is important to be able to pick the best 
algorithms given a desired accuracy and computational complexity. 



Appendix A 

Finite Motion and Instantaneous 
Motion Optical Flow Equations 



This appendix derives the optical flow equations for the finite motion equation and compares 
them to the instantaneous optical flow equations shown in section 2.2.2. 

Let TZ denote a rotation matrix i.e. 7^72. r = I 3 and T a translation vector. The finite motion 
of a point R of rigid body is defined by the equation 

R' = KR + T (A.l) 

Let r and r' represent the perspective projections of the points R and R' onto the image plane, 
then 

Sr = r' -r = F 



(M) 

j7 (! k - i *)- + ! t 



but 



hence 



or 



Z' = ({UK + T) • 4) = ^(Z(TZt . ft) + FT Z ) 



FZ 

or — 



ZTZr • z + FT 



: ( (7J .(^ l3 ) r+ I (fT _ rTi) ) 



Sr = Z(Kr ■ »)/'f + T./Z {(* - T> + h F T - rT -') ■ (A ' 2) 
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Equation A.2 represents the finite optical flow equation derived from the finite rigid motion 
equation A.l. The instantaneous optical flow equation can be inferred from the finite motion 
optical flow equation A.2 by assuming that the rotation matrix has small rotation angles i.e. 
1Z « I3 + ilSt (assuming a constant rate of rotation) and by assuming that the component of 
the translation along the optical axis is small relative to the distance of the object from the 
camera i.e. (T z <C 1). Under these assumptions 

and 



(-f) 






Tlr » (I 3 - -— )SlrSt 
r 



With these approximations, the finite motion optical flow equation A.2 becomes 



St / rz \^ 1 / „, , 



) 



where V represents the translational velocity (T = VSt for a constant velocity V). As St 

St 
decreases to zero, — tends to f , the optical flow, and equations A.2 and 2.9 are identical. 
St 



Appendix B 

Gradients and Hessians at 
Neighboring Points 



In this section, we will derive the relationship between the spatial gradients (and Hessian) of two 
corresponding points in two frames. In particular, we show that the gradients (and Hessians) 
are approximately equal in the two fields under some conditions. 

Lets us denote by d(x, t) = (u(t) , v(t )) T the displacement field at the point x between the 
irradiance field .E(x,*i) = #i(x) at time t x and the irradiance field E(x,t 2 ) = E 2 (x) at time 
t 2 . The displacement field between the two irradiance frames over the domain D is denned by 
the equation: 

£(x + d(x,f 1 ),f 1 ) = E(x,f 2 ) for VxGD (B.l) 

If the displacement field is constant over a neighborhood N of x, we have the relationships 

VxJ^x + d) = V x £ 2 (x) 
VV^x+d) = VV x £ 2 (x) 

for all x in the neighborhood N and we can expect a similar approximate relationship for a 
"nearly" constant or slowly varying displacement field in the neighborhood N. 

Proposition B.l Let -Ei(x) and E 2 ('x.) be two vector fields, twice differentiable, and d(x) a 
vectorial disparity field, also twice differentiable, defined by the relation i?i(x+ d(x)) = E 2 (x). 
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The spatial gradients and Hessian are given by 

V x £ 2 (x) = V x £x(x) + (^^) V x £ x (x) (B.3) 



and 



VV x 2? 2 (x) = VV x E x (x) + (M^) r Vx ^(x) + VV x u^l + VV x x>^^ (B.4) 

Proof: Taking the partial derivatives of equation B.l and applying the chain rule yields 

8E 2 9E ln ^ x 0£i 

dx dx dy m c-n 

dE 2 dEx dE lt . ^^ 

-dy~ = -dx- U » + -di( 1 + V ^ 
where u x and u y denote the partial derivative of the scalar u with respect to x and y, and 
similarly for v. Rearranging the terms and using the gradient vector gives, i.e. 



XX 



(B.6) 



V x £ 2 = V x E-l + I I V x £ x , 

V % v s/ / 

that can be rewritten in full matrix notation as equation B.3. 

Differentiating equation B.5 with respect to x and y we get 

d 2 E 2 d 2 E 1( ^ dEx , d 2 E x , dEh. 

7* = -dx^ {1 + Ux) + -dx~ U ** + ~dxV V * + ~dy~ V ' 

d 2 E 2 d 2 E 1( ^ dEx d 2 E x p dEx 

-dxj = -d^y~ il + Ux) + -dx~ u ** + ~W Vx + ~dy~ 

d 2 E 2 d 2 Ex dEx 8 2 Ei„ , OEx 

IxJ = ~M U » + -dx~ U *v + ^ 1 + ^ + -dy~ V ** 

8 2 E 2 B 2 Ex BEx d 2 E lr . 6Ex 

~W = ~M Uy + ~dx~ Uyy + ~dxJ {1 + Vy) + ~bV Vyy 

Equation B.6 can be written in vector form, 

VV X £ 2 =( 1 + U * Vx ) VV X £ X + VV X «^ + VV X 3 (B.7) 

V u y l + v y J dx dy 

which is equivalent to equation B.4. 

If we assume that ||V x u|| and ||V x w|| are small, i.e. \u x \, \u y \, \v x \, \v y \ < 1, then equa- 
tion B.3 simplifies to V x i?i(x) w V x E 2 (x). If, in addition, if we assume that || VV x u|| and 
||VV x t>|| are small, i.e. all the second-order components of the disparity field are small, then 
equation B.4 simplies to VV x J5i(x) « VV x E 2 (x). The previous conditions on the disparity 
field d(x) are met when d(x) is locally constant, translational or slowly varying. 
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Matrix A and Stencil Computation 



This appendix symbolically computes the matrix A in terms of the orthogonal basis functions for 

a general spatiotemporal window, and presents simplified expressions in the cases of symmetric 

spatial windows and symmetric spatiotemporal windows. 

The symmetric matrix A is denned by A = V^ Wf T where ¥ is the vector of orthogonal 

w 
basis functions, )P T = (1 x y t xt yt xy x 2 y 2 ) and W represents the spatio-temporal 

analysis window. The explicit summation indices will be omitted in the following development 

in order to simplify the equations. 

For a general window W, the symmetric matrix A is given by 
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Since most windows are spatially symmetric around the central point, the matrix A is sparse 
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and is given by 
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If the three windows (horizontal, vertical and temporal) are symmetric around the central point, 
and the spatial extend of the two spatial windows is equal to N and the length of the temporal 
window is T, the representation of the matrix A can be further simplified to 
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(C.l) 

where N' = 4N and M = 4(iV/2 + 1). It should be noted that the summations in equation C.l 
are one-dimensional summations along the horizontal or temporal axis, as appropriate, unlike 
the summations in the previous equations that were three-dimensional. These sparse symmetric 
matrices have also sparse symmetric inverses that can be computed easily. 

Equation C.l can be used to compute the matrices for different symmetric windows. The 
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numerical values of the A and A 1 are given below for the 3x3x2 and 5x5x2 symmetric 
windows cases. 
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Appendix D 

Temporal Derivative of Shading 
Models 



D.l Temporal Derivative of General Lambertian Model 

This section derives equation 3.2, the temporal derivative of the general Lambertian shading 
model described by equation 3.1. Let us consider a constant light source of intensity L and 
position 1. Denote by L the light source as seen by the patch at point R 

f L 1-R 



U-RII 



were L = 1 — R is the illumination vector from the surface to the source (see figure D.l) 
The irradiance E(r, t) is proportional to (L • ft) (K constant of proportionality) i.e. 

E(r,t) = K(L ■ ft) ^ E(r,t) ||1 - R|| = K((l - R) • fi) (D.l) 

If we assume that 1 is fixed and differentiate equation D.l with respect to time, we obtain 

^||i-R||-E ((I -^ =Jsr(-A.ik + ((i-R).&)) 

>«=>• £||L|| = ^((-n.(t + u>xR)) + (L-(u>x n))) + E(L. (t + u> x R)) 
<^> E = — [(-n • (t + w x R)) + (L • (w x ft)) + (t • A)(t • (t + « x R))] 
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Figure D.l: Light source, camera and patch geometry 

Replacing L by its value, we obtain 

E =p^([l,u > ,n]-(t.ri)+((a > xR + t).L)(L.n)) 

= w O 1 ''*' h]+{1 ' ^^^ - ((* - (* • t ) t ) • *)) 

D.2 First— Order Approximation of Lambertian Model 

This section derives equation 3.3, the first-order approximation of the temporal shading varia- 
tions of the general Lambertian model described by equation 3.2. 

The square of the norm of the light source-patch vector L = 1 - R is given by 

||L|| 2 = ||l|| 2 -2(i.R)||R||||l|| + ||R|| 2 



1-2(1-R) 



Hill V llill ) 



The first-order Taylor series of the reciprocal of the norm of L is directly computed from the 
previous equation and is expressed by 



m -.|l„-.( 1 + ( ,. 4) M) +0 (M). 



(D.2) 



The first-order expansion of the unit vector L is inferred from equation D.2 and can be 
written, after regrouping the zeroth- and first-order terms, as 



i = i-( ft - ( i^))M + (M) 



(D.3) 
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The L-orthogonal translation vector t^ = t — (t • L)L can similarly be approximated by a 
first-order expression. If we denote by t^ = t — (t • 1)1 the l-orthogonal translation vector and 
use the equation D.3, the first-order Taylor series of t± is given by 

t£ = t{ + ((t • 1)R + (t • R)l - 2(t . !)(1 • R)l) H + o ( W) (D.4) 

The first-order approximation of the term (L • n)[L,w,R] is determined similarly by using 
equation D.3, multiplying out the terms and gathering the zeroth- and first-order terms — 

(L.n)[L,*,R] = [i,^R](l.n-R.i^ (D.5) 

The first-order Taylor series (equation 3.3) is finally determined by plugging back the equa- 
tions D.2, D.3, D.4 and D.5 into the original equation 3.2, multiplying out all the terms and 
collating the zeroth- and first order terms to yield 

E = px(a,P)L Q ( [l,u,,n] + (I • n)fl,u,,R]H - ^-fi ) . 
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Appendix E 



Quadratic Case Shading Equation 



This appendix presents the derivations of the expressions of the generic normal n and unit 
normal fi in terms of the unit normal no and curvatures at Zo, the point of expansion of the 
Taylor series expressing the reciprocal of the depth in terms of the image coordinates. 
A quadratic patch is expressed by 

Z(X,Y) = Z + Z X X + Z Y Y + X -Z XX X 2 + ZxyXY + ^Z YY Y 2 

or 

Z = R • n + Q • d 

where n = (— Zx — Zy 1) T represents the normal at the point Z , Q = {\X 2 XY \Y 2 ) T 
is a quadratic world coordinates vector and d = {—Zxx — Zyy — Zxy) T is the "curvature" 
vector at the point Zo. The reciprocal of the depth can immediately be computed in terms of 
the image coordinates by a second-order Taylor series of 1/Z, 



! = ^(r.n o ) + q.d + 0(||r|| 2 ) 

where q = (|« 2 xy \y 2 ) 1S a quadratic image coordinate vector. 
The generic normal n is defined by 
/ dZ(X,Y) \ 



(E.l) 



n = 



ax 

dZ(X,Y) 

dY 

1 



/ 



■Zx 

-Z Y 

1 



^ / -ZxxX - ZxyY \ 
+ -ZxyX - Zyy 
\ ° / 
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APPENDIX E. QUADRATIC CASE SHADING EQUATION 



and can be rewritten, in vectorial form, 



n = n + HR = n + ZHr 



(E.2) 



where the matrix H is denned by 



H = 



-Zxx -Zxy 
Zxy -Zyy 




y o o o j 

The depth Z can be computed in terms of the image coordinates by a second-order Taylor 
series; from equation E.l we get 

Z = Z (r. n + Z (q-d))- 1 

= Zo(l + r.no + ^o(q-d)) _1 (E.3) 

= Z (l-r.n )-Zo(q-d) + (r.n ) 2 + C>(||r|| 2 ). 

Plugging the value of the depth Z, denned by equation E.3, into equation E.2, and collecting 
the terms up to the second order yields 



n = n + Z (1 - r . n„)Hr + C(||r|| 2 ). 



(E.4) 



In order to compute the unit normal, we need to determine the value of ||n|| 1 . The square 
of the norm of n is given by 



n = 



|n || 2 + 2Z {1 - r- n ) (n • Hr) + Z 2 r r H 2 r + C?(||r|| 2 ) 



(E.5) 



= K II 2 (l + 2^(1 -r.n ) (n - Hr) + ^r r H 2 r) + e?(||r|| 2 ). 
From equation E.5 we can compute a second-order Taylor series of ||n|| _1 : 

H = irar( 1 -irar (1 -';*' )( *'- ,, ' , -5ii3^ HS+ 

| ( jj^j (i - f • *.) VmwtfHr) ) + o(|Hi ! ) 

= ra ('- nii (1 - ? - "» )(ft °- Hr) - ^ rT(H2r - 3H ^ TH )') + °(w ! ) 

(E.6) 
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Appendix F 



Implementation Issues 



In the following expressions, we will use the summation convention of tensor calculus, i.e. there 
is an implicit summation over any index that appears twice in an expression. 

{Mi}y = If ViVjdr + (If Mr J {n X k},- + (II Vj drj {n X k}< 



■~s^~ 



£l(9-»6) £ 2 (3~3) £ 2 (3.->3) 

+ 



(jfjfdrjfftxkMftxk},- 

£j(ii-»i) 
{M 2 }ij = I ViSjr k dr) n k + {hx k}; f // Sjr k dr J n k 



£ 4 (27>-»27) £ 5 (9~9) 



{M 4 }ij = I SiSjr k ridrj n k rn 



£ 6 (81>->36) 

{ei}i = (flEtVidr^ + ihxkyiniEtdT) 

2Lv ' " ^v ' 

£r(3>->3) £»(1<-»1) 

{e 2 }» = III EtSirkdr) n k 

- - — w ' 

£ 9 (9>->9) 

In the previous expressions, £;(j h-> A;) represents the i 4 ' 1 precomputed term has j elements from 
which only k are distinct (symmetry).. All the £;, i.e. 95 numbers, depend only on r, E T and 
Et, and represent all the image information required to solve the problem. 
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